US10805174B2 - Using machine learning to monitor link quality and predict link faults - Google Patents

Using machine learning to monitor link quality and predict link faults Download PDF

Info

Publication number
US10805174B2
US10805174B2 US16/406,251 US201916406251A US10805174B2 US 10805174 B2 US10805174 B2 US 10805174B2 US 201916406251 A US201916406251 A US 201916406251A US 10805174 B2 US10805174 B2 US 10805174B2
Authority
US
United States
Prior art keywords
link
class
data model
quality
processors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/406,251
Other versions
US20190268240A1 (en
Inventor
Alam Yadav
Madhava N
Saikat Sanyal
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Juniper Networks Inc
Original Assignee
Juniper Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Juniper Networks Inc filed Critical Juniper Networks Inc
Priority to US16/406,251 priority Critical patent/US10805174B2/en
Assigned to JUNIPER NETWORKS, INC. reassignment JUNIPER NETWORKS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: N, Madhava, SANYAL, SAIKAT, YADAV, ALAM
Publication of US20190268240A1 publication Critical patent/US20190268240A1/en
Application granted granted Critical
Publication of US10805174B2 publication Critical patent/US10805174B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0847Transmission error
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Definitions

  • a network device may include internal links and external links.
  • a network device may include internal links that allow traffic flow (e.g., packets) between components of the network device and/or external links that allow traffic flow between network devices.
  • traffic flow e.g., packets
  • a device may include one or more processors to receive a trained data model.
  • the data model may be trained with historical link quality information associated with a set of links.
  • the data model may include one or more values associated with measuring link quality.
  • the one or more processors may determine, after receiving the trained data model, link quality information associated with a link that is actively supporting traffic flow.
  • the one or more processors may classify the link by using the link quality information as input for the data model.
  • the data model may classify the link into a first class associated with a first measure of link quality, a second class associated with a second measure of link quality, or a third class associated with a third measure of link quality.
  • the one or more processors may determine whether the link is correctly classified by updating the data model with information associated with improving accuracy of classifying the link.
  • the one or more processors may update a class of the link to the first class, the second class, or the third class after determining whether the link is correctly classified.
  • the one or more processors may perform one or more actions associated with improving link quality based on classifying the link and/or updating a class of the link.
  • a non-transitory computer-readable medium may store one or more instructions that, when executed by one or more processors, cause the one or more processors to obtain a data model that is trained using historical link quality information associated with a set of links.
  • the historical link quality information may include one or more values associated with measuring link quality.
  • the one or more instructions may cause the one or more processors to determine, after obtaining the data model, link quality information associated with a link that is actively supporting traffic flow.
  • the one or more instructions may cause the one or more processors to classify the link by using the link quality information as input for the data model.
  • the data model may classify the link into a class of a set of classes associated with measuring link quality.
  • the one or more instructions may cause the one or more processors to determine whether the link is correctly classified by performing one or more actions associated with improving accuracy of classifying the link.
  • the one or more instructions may cause the one or more processors to selectively update the class of the link after determining whether the link is correctly classified.
  • the one or more instructions may cause the one or more processors to perform one or more actions associated with improving link quality based on classifying the link and/or selectively updating the class of the link.
  • a method may include receiving, by a device, a trained data model.
  • the trained data model may be trained using historical link quality information associated with a set of links.
  • the method may include determining, by the device and after receiving the trained data model, link quality information associated with a link that is actively supporting traffic.
  • the method may include classifying the link, by the device, by using the link quality information as input for the data model.
  • the data model may classify the link into a class of a set of classes associated with measuring link quality.
  • the method may include determining, by the device, an actual quality level of the link.
  • the method may include selectively updating the class of the link, by the device, after determining the actual link quality of the link.
  • the method may include performing, by the device, one or more actions associated with improving link quality based on classifying the link and/or selectively updating the class of the link.
  • FIGS. 1A-1D are diagrams of an overview of an example implementation described herein;
  • FIG. 2 is a diagram of an example environment in which systems and/or methods, described herein, may be implemented;
  • FIG. 3 is a diagram of example components of one or more devices of FIG. 2 ;
  • FIG. 4 is a flow chart of an example process for using machine learning to monitor link quality and detect link faults.
  • FIGS. 5A-5C are diagrams of one or more example implementations relating to the example process shown in FIG. 4 .
  • a network device handles may increase. This may lead to higher bandwidth interconnects within a board (e.g., a printed circuit board (PCB)) located within the network device, and may further cause links (e.g., Serializer-Deserializer (SerDes) links) to be run at higher speeds. Furthermore, the board may be densely packed with internal links, causing interference and leading to link degradation over the life of the board. However, many network devices are only able to detect link faults (e.g., link degradation) after the faults occur.
  • link faults e.g., link degradation
  • a model generation device may use historical link quality information associated with a set of links to train a data model (e.g., by creating a prediction function), and may provide the trained data model (e.g., the prediction function) to a set of network devices.
  • the set of network devices may classify links that actively support traffic flow by using link quality information associated with the links as input for the data model.
  • the network device may perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality (e.g., request to repair a link and/or a board, request to replace a board, adapt to environmental conditions associated with a link, etc.).
  • one or more actions associated with improving accuracy of link classification e.g., request to repair a link and/or a board, request to replace a board, adapt to environmental conditions associated with a link, etc.
  • the set of network devices predict link faults and, by taking one or more pre-emptive actions, eliminate traffic loss via links associated with the set of network devices. Furthermore, by performing one or more actions associated with improving link quality, the set of network devices improve efficiency and reliability of network communications.
  • FIGS. 1A-1D are diagrams of an overview of an example implementation 100 described herein.
  • example implementation 100 may include a model generation device that uses one or more machine learning techniques to train a data model that may be used to monitor link quality and detect link faults.
  • a set of network devices may use the trained data model to perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality.
  • the model generation device may receive, from a data source, historical link quality information.
  • the historical link quality information may include one or more values associated with measuring link quality.
  • the historical performance information may include one or more actual measures of link quality, one or more predictors of link quality, and/or one or more environment conditions.
  • the one or more actual measures of link quality may include one or more values associated with identifying errors in data transmission, such as a forward error correction (FEC) value, a cyclic redundancy check (CRC) value, one or more values measuring signal integrity, such as a signal-to-noise (SNR) ratio value, and/or the like.
  • the one or more predictors of link quality may include a bit error rate (BER) value, a link eye width value, a link eye height value, a link quality slope, and/or the like.
  • the one or more environment conditions may include a temperature value (e.g., a temperature of the board, a temperature of the system, etc.), a link uptime value, and/or the like.
  • the historical link quality information may be associated with a set of network devices that operate under different environmental conditions, thereby allowing the model generation device to train a data model that accounts for a number of different environmental conditions.
  • the model generation device may train a data model.
  • the model generation device may train the data model by creating a prediction function that is able to classify links that are actively supporting traffic flow, as described further herein.
  • the network device may create the prediction function by associating values included in the historical link quality information with quality coefficient values (e.g., values indicating a particular link quality measurement). Additionally, the network device may configure one or more weight values that may be used in determining an overall link quality score.
  • the historical link quality information includes five values (e.g., value A, value B, value C, value D, and value E). Further assume that value A has a positive influence on link quality (e.g., a high A value is a strong indicator of high link quality), that values B-D do not have a high influence on link quality, and that value E has a negative influence on link quality (e.g., a high E value is a strong indicator of low link quality).
  • the model generation device may provide the data model to a set of network devices (shown as network device 1 through network device N).
  • the model generation device may provide the prediction function to the set of network devices to serve as an initial data model that each network device may implement when monitoring traffic flow.
  • the set of network devices may receive a trained data model that may be used to classify links that actively support traffic flow.
  • FIGS. 1B-1D show a first network device (shown as network device 1 ) performing one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality
  • the set of network devices each perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality.
  • the first network device may analyze traffic associated with a link to determine link quality information. For example, traffic may be passing through a link, and the first network device may use one or more performance monitoring techniques to determine link quality information for the link.
  • the first network device may determine a BER value by comparing bit string values before and after the bit string travels through the link.
  • the first network device may use one or more sensors to measure temperature values associated with a board.
  • the first network device may monitor traffic to determine a noise value and a signal value, and may process the values to determine an SNR value. In this way, the first network device may determine link quality information for a link that is actively supporting traffic flow.
  • the first network device may classify the link using the data model.
  • the first network device may classify the link by using the link quality information as input to the data model, which may cause the data model to output an overall link quality score.
  • the overall link quality score may be associated with one or more link quality classes, such as a class associated with high link quality, a class associated with marginal link quality, or a class associated with low link quality.
  • the data model may classify the link into a class associated with marginal link quality.
  • the first network device may perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality, as described further herein.
  • the first network device may determine an actual quality level of the link. For example, the first network device may determine an actual quality level of the link while the link is active and/or may determine an actual quality level of the link after disabling the link. As an example, the first network device may determine an actual quality level of the link while the link is active by executing non-intrusive techniques such as an FEC technique, a CRC technique, or the like. As another example, the first network device may disable the link after traffic has been redirected to another link (so as to prevent traffic drop on the link), and may determine an actual quality level of the link by performing a diagnostic test on the link.
  • non-intrusive techniques such as an FEC technique, a CRC technique, or the like.
  • the first network device may disable the link after traffic has been redirected to another link (so as to prevent traffic drop on the link), and may determine an actual quality level of the link by performing a diagnostic test on the link.
  • the first network device may use a pseudorandom binary sequence (PRBS) test to monitor bits that travel through the link.
  • PRBS pseudorandom binary sequence
  • the link may pass or fail the PRBS test based on a number of bits that successfully travel through the link.
  • a threshold amount of bits change value (e.g., from 0 to 1 or from 1 to 0) as the bits travel through the link. This may cause the link to fail the diagnostic test, indicating that the link is a low quality link.
  • the first network device may determine whether the link is correctly classified. For example, the first network device may compare the classification of the link and the actual quality level of the link (e.g., as indicated by the diagnostic test). If the classification of the link and the actual quality level of the link does not match (i.e., the link is incorrectly classified), then the first network device may update the data model by modifying one or more quality coefficient values to improve subsequent classifications of the link. For example, the first network device may receive a classification as a marginal quality link (e.g., as a result of training data), but may have an actual link classification of low link quality due to environment conditions that are specific to the first network device. In this way, the first network device may update the data model to dynamically adapt to the environment conditions that are specific to the first network device.
  • the first network device may update the data model to dynamically adapt to the environment conditions that are specific to the first network device.
  • the first network device may update the class of the link.
  • the first network device may update the class of the link based on a result of the diagnostic test.
  • the first network device may update the class of the link to a different class than the previous classification.
  • the first network device may update the class of the link to a class associated with high link quality if the link passes the diagnostic test, and may update the class of the link to a class associated with low link quality if the link fails the diagnostic test.
  • the first network device may update the class of the link by re-using the link quality information as input for the data model.
  • the data model may output a more accurate link quality prediction as a result of modifying the one or more quality coefficient values associated with the data model.
  • the first network device may update the class of the link to a class associated with low link quality.
  • the first network device By determining an actual quality level of the link and updating the class of the link, the first network device is able to improve accuracy of link classification. Furthermore, if the link is classified or reclassified as a low quality link, the first network device may disable the link to prevent subsequent traffic from suffering from packet loss.
  • the first network device may determine whether the link has received a threshold number of classifications associated with low link quality. For example, the first network device may monitor the link over an interval, and may periodically classify the link throughout the interval. In this case, if the first network device classifies the link as a low quality link, then the first network device may compare a current number of times the link is classified as a low quality link to a threshold number of classifications associated with low link quality. In this way, the first network device is able to identify whether to blacklist the link (i.e., prevent the link from being able to actively support traffic flow) or to monitor and test the link.
  • the first network device may monitor and test the link. For example, the first network device may continue to monitor the link, perform an additional diagnostic test on the link, execute additional FED techniques and/or update the data model to improve accuracy of link classification, and/or the like. In this way, the first network device is able to verify whether the link is a low quality link before blacklisting the link and/or performing additional actions associated with improving link quality.
  • the first network device may blacklist the link.
  • the first network device may blacklist the link to prevent the link from being able to actively support traffic flow. By blacklisting the link, the first network device avoids traffic drops and link faults that may result if the low quality link continues supporting traffic flow, thereby improving quality of network communications.
  • the first network device may generate a recommendation.
  • the first network device may generate a recommendation to repair the link (e.g., an internal link, an external link) and/or the board, replace the link (e.g., an external link) and/or the board, or the like.
  • the recommendation may include the link quality information associated with the link, results of the diagnostic test, and/or instructions indicating a particular action to be performed (e.g., repair the link, repair the board, replace the link, replace the board, etc.).
  • the first network device may provide the recommendation to a network management device associated with an interested party (e.g., a technician).
  • the interested party may repair and/or replace the link and/or the board, thereby improving network communications relative to allowing a low quality link to continue to support traffic flow.
  • the first network device may generate and provide a recommendation for a link that is classified or reclassified into a class associated with marginal link quality. In this way, the interested party may repair and/or replace the link and/or the board prior to packet drop and/or signal loss.
  • the first network device may continue tuning the data model. For example, the first network device may continue to monitor traffic flow and modify the data model to improve accuracy of link classification, adapt to environmental conditions associated with the link, use a forecasting technique to identify a link fault prior to the fault occurring, and/or the like.
  • the first network device is able to manage link quality and predict link faults prior to the occurrence of the links faults.
  • FIGS. 1A-1D are provided merely as an example. Other examples are possible and may differ from what was described with regard to FIGS. 1A-1D .
  • FIG. 2 is a diagram of an example environment 200 in which systems and/or methods, described herein, may be implemented.
  • environment 200 may include one or more data sources 210 , a model generation device 220 , one or more network devices 230 - 1 through 230 -N (N ⁇ 1) (hereinafter referred to collectively as “network devices 230 ”, and individually as “network device 230 ”), a network management device 240 , and/or a network 250 .
  • Devices of environment 200 may interconnect via wired connections, wireless connections, or a combination of wired and wireless connections.
  • Data source 210 includes one or more devices capable of receiving, storing, and/or providing historical link quality information.
  • data source 210 may include a server device or a group of server devices.
  • data source 210 may interact with a set of network devices 230 (or other devices that may monitor link quality) to receive link quality information and store the link quality information as historical link quality information.
  • data source 210 may store historical link quality information associated with a set of links.
  • data source 210 may provide the historical link quality information to model generation device 220 and/or network device 230 .
  • Model generation device 220 includes one or more devices capable of receiving, storing, processing, and/or providing information associated with link quality.
  • model generation device 220 may include a server device or a group of server devices.
  • model generation device 220 may be implemented as a cloud platform.
  • model generation device 220 may be implemented as a server device (e.g., an on-site server device).
  • model generation device 220 may receive historical link quality information from data source 210 .
  • model generation device 220 may process the historical link quality information to train a data model (e.g., by creating a prediction function).
  • model generation device 220 may provide a trained data model (e.g., the prediction function) to network device 230 .
  • Network device 230 includes one or more devices capable of receiving, processing, forwarding, and/or transferring information associated with a link.
  • network device 230 may include a router, such as a label switching router (LSR), a label edge router (LER), an ingress router, an egress router, a provider router (e.g., a provider edge router, a provider core router, etc.), a virtual router, or the like.
  • network device 230 may include a gateway, a switch, a firewall, a hub, a bridge, a reverse proxy, a server (e.g., a proxy server, a cloud server, a data center server, etc.), a load balancer, or a similar device.
  • network device 230 may be a physical device implemented within a housing, such as a chassis.
  • network device 230 may be a virtual device implemented by one or more computer devices of a cloud computing environment or a data center.
  • network device 230 may include or connect to a set of links, and the set of links may be monitored (e.g., using a hardware component, such as a sensor or a tap, using a software module, etc.) to identify link quality information associated with the set of links.
  • network device 230 may use a data model (e.g., a prediction function) to classify the link, and may perform one or more actions associated with improving accuracy of link classification (e.g., as shown in FIG. 2 by feedback loops, network devices 230 may tune the data model) and/or one or more actions associated with improving link quality.
  • network device 230 may provide link quality information to data source 210 (e.g., which may be stored as historical link quality information).
  • network device 230 may provide a recommendation to repair a link to network management device 240 .
  • Network management device 240 includes one or more devices capable of receiving, storing, processing, and/or providing information associated with a link.
  • network management device 240 may include a computing device, such as a desktop computer, a laptop computer, a tablet computer, a server device, a mobile phone (e.g., a smart phone or a radiotelephone), a wearable computer (e.g., a smart watch, a smart band, a smart pair of eyeglasses, etc.), a sensor device, or a similar type of device.
  • network management device 240 may receive, from network device 230 , a recommendation to repair a link, to replace a board associated with the link, or the like. In this case, network management device 240 may schedule an appointment to repair the link or to replace a board associated with the link, and/or may provide a response message to network device 230 indicating that the appointment has been scheduled.
  • Network 250 includes one or more wired and/or wireless networks.
  • network 250 may include a cellular network (e.g., a fifth generation (5G) network, a fourth generation (4G) network, such as a long-term evolution (LTE) network, a third generation (3G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, or the like, and/or a combination of these or other types of networks.
  • 5G fifth generation
  • 4G fourth generation
  • LTE long-term evolution
  • 3G third generation
  • CDMA code division multiple access
  • PLMN public land mobile network
  • LAN local area network
  • WAN wide area network
  • the number and arrangement of devices and networks shown in FIG. 2 are provided as an example. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or differently arranged devices and/or networks than those shown in FIG. 2 . Furthermore, two or more devices shown in FIG. 2 may be implemented within a single device, or a single device shown in FIG. 2 may be implemented as multiple, distributed devices. Additionally, or alternatively, a set of devices (e.g., one or more devices) of environment 200 may perform one or more functions described as being performed by another set of devices of environment 200 .
  • FIG. 3 is a diagram of example components of a device 300 .
  • Device 300 may correspond to data source 210 , model generation device 220 , network device 230 , and/or network management device 240 .
  • data source 210 , model generation device 220 , network device 230 , and/or network management device 240 may include one or more devices 300 and/or one or more components of device 300 .
  • device 300 may include a switching fabric 310 , one or more line cards 320 , one or more links (e.g., a set of external links 330 , a set of inter-board links 340 , a set of intra-board (onboard) links 342 , etc.), and/or a controller 350 .
  • links e.g., a set of external links 330 , a set of inter-board links 340 , a set of intra-board (onboard) links 342 , etc.
  • traffic between switching fabric 310 and controller 350 may be provided and/or received through a single internal link 340 .
  • traffic between switching fabric 310 and controller 350 may be provided and/or received through a set of inter-board links 340 , where each internal link 340 may be designated for a subset of external links 330 and/or a subset of line cards 320 .
  • line card 320 may use a set of inter-board links 340 to communicate with one or more corresponding planes of a switching fabric 310 .
  • Switching fabric 310 interconnects external links 330 via line cards 320 .
  • switching fabric 310 may be implemented using one or more switching fabric components 312 (e.g., one or more crossbars, one or more busses, one or more shared memories, and/or one or more planes).
  • switching fabric components 312 may be connected using intra-board (onboard) links 342 .
  • switching fabric 310 may enable external links 330 to communicate. For example, switching fabric 310 may connect with one or more line cards 320 via a set of inter-board links 340 , and the one or more line cards 320 may connect with the external links 330 , as described further herein.
  • Line cards 320 include one or more line card components 322 .
  • line cards 320 may include a modular electronic circuit designed to fit on a PCB, and may include one or more line card component 322 (e.g., a packet processing component, a re-timer, etc.).
  • a packet processing component may include one or more processors to process packets, and may process incoming traffic, such as by performing data link layer encapsulation or decapsulation.
  • a packet processing component may receive a packet from switching fabric 310 , may process the packet, and may output the processed packet to an appropriate external link 330 connected to line card component 320 .
  • a packet processing component may receive a packet from an external link 330 , may process the packet, and may output the processed packet to switching fabric 310 for transfer to controller 350 and/or to another external link 330 (e.g., via the same packet processing component or a different packet processing component).
  • External link 330 is a point of attachment for physical links, and may be a point of ingress and/or egress for incoming and/or outgoing traffic, such as packets.
  • a single line card 320 may be connected to multiple external links 330 .
  • a single line card 320 may be connected to a single external link 330 .
  • An external link 330 may permit communication between a first network device 230 and a second network device 230 that is a neighbor of the first network device 230 .
  • External link 330 may store packets (e.g., in a buffer) and/or may schedule packets for transmission on output physical links.
  • External link 330 may support data link layer encapsulation or decapsulation and/or a variety of higher-level protocols.
  • Inter-board link 340 is a path that allows line card 320 and/or controller 350 to communicate with switching fabric 310 .
  • Inter-board link 340 may include, for example, a wired or wireless path, such as a fiber-optic path, an electrical path, or the like.
  • Intra-board (onboard) link 342 is a path that allows interconnection between line card components 322 and/or switching fabric components 312 .
  • Controller 350 includes a processor in the form of, for example, a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), and/or another type of processor.
  • the processor is implemented in hardware, firmware, or a combination of hardware and software.
  • controller 350 may include one or more processors that may be programmed to perform a function.
  • controller 350 may include a group of virtual devices that each includes one or more processors.
  • controller 350 may include a random access memory (RAM), a read only memory (ROM), and/or another type of dynamic or static storage device (e.g., a flash memory, a magnetic memory, an optical memory, etc.) that stores information and/or instructions for use by controller 350 .
  • RAM random access memory
  • ROM read only memory
  • static storage device e.g., a flash memory, a magnetic memory, an optical memory, etc.
  • controller 350 may communicate with other devices, networks, and/or systems connected to device 300 to exchange information regarding network topology. Controller 350 may create routing tables based on the network topology information, create forwarding tables based on the routing tables, and forward the forwarding tables to line card component 320 , such as for use in performing route lookups for incoming and/or outgoing packets.
  • Controller 350 may perform one or more processes described herein. Controller 350 may perform these processes in response to executing software instructions stored by a non-transitory computer-readable medium.
  • a computer-readable medium is defined herein as a non-transitory memory device.
  • a memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
  • Software instructions may be read into a memory and/or a storage component associated with controller 350 from another computer-readable medium or from another device via a communication interface. When executed, software instructions stored in a memory and/or storage component associated with controller 350 may cause controller 350 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, implementations described herein are not limited to any specific combination of hardware circuitry and software.
  • device 300 may include additional components, fewer components, different components, or differently arranged components than those shown in FIG. 3 . Additionally, or alternatively, a set of components (e.g., one or more components) of device 300 may perform one or more functions described as being performed by another set of components of device 300 .
  • FIG. 4 is a flow chart of an example process 400 for using machine learning to monitor link quality and detect link faults.
  • one or more process blocks of FIG. 4 may be performed by network device 230 .
  • one or more process blocks of FIG. 4 may be performed by another device or a group of devices separate from or including network device 230 , such as data source 210 , model generation device 220 , and/or network management device 240 .
  • process 400 may include receiving historical link quality information associated with a set of links (block 410 ).
  • model generation device 220 may receive, from data source 210 , historical link quality information associated with a set of links (e.g., a set of internal links within a network device, a set of external links between devices, etc.).
  • a link may be an internal link that provides a connection between components and/or modules of a device (e.g., hardware components, software modules, etc.) or may be an external link that provides a connection between devices.
  • data source 210 may store historical link quality information.
  • data source 210 may store historical link quality information using a data structure, such as an array, a linked-list, a tree, a graph, a hash table, a database, and/or the like.
  • data source 210 may store large quantities of data. For example, data source 210 may store thousands, millions, billions, or even trillions of data points.
  • model generation device 220 may receive historical link quality information.
  • Historical link quality information may include one or more actual measures of link quality, one or more predictors of link quality, and/or one or more environment conditions.
  • the one or more actual measures of link quality may include one or more values associated with identifying errors in data transmission, such as a forward error correction (FEC) value, a cyclic redundancy check (CRC) value, one or more values measuring signal integrity, such as a signal-to-noise (SNR) ratio value, and/or the like.
  • the one or more predictors of link quality may include a bit error rate (BER) value, a link eye width value, a link eye height value, a link quality slope, and/or the like.
  • the one or more environment conditions may include a temperature value (e.g., a temperature of the board, a temperature of the system, etc.), a link uptime value, and/or the like.
  • model generation device 220 may receive historical link quality information for a set of network devices 230 .
  • model generation device 220 may receive historical link quality information for a set of network devices 230 that operate under different environment conditions. By receiving historical link quality information for devices operating under different environment conditions, the model generation device 220 may process the historical link quality information to train a data model, as described further herein.
  • model generation device 220 may receive historical link quality information associated with a set of links, such that the information may be used to train a data model.
  • process 400 may include training a data model using the historical link quality information (block 420 ).
  • model generation device 220 may, using a machine learning technique, train a data model that may be used to classify a link, as described further herein.
  • model generation device 220 may train a data model for use by a set of network devices 230 .
  • model generation device 220 may train a data model by creating a prediction function, and may provide the prediction function to the set of network devices 230 .
  • model generation device 220 may create the prediction function by associating values included in the historical link quality information with dynamic quality coefficient values (e.g., which may be positive or negative).
  • a quality coefficient value may be a value indicating a particular link quality measurement.
  • the quality coefficient values may be grouped into classes associated with high link quality, marginal link quality, or low link quality.
  • model generation device 220 may create the prediction function by configuring one or more weight values that may be used in determining an overall link quality score. For example, model generation device 220 may assign weights to particular quality coefficient values. In this way, model generation device 220 may use the weighted quality coefficient values to determine an overall link quality score.
  • model generation device 220 may train a data model using a supervised machine learning technique. Additionally, or alternatively, model generation device 220 may train a data model using a different type of machine learning technique, such as machine learning via clustering, dimensionality reduction, structured prediction, anomaly detection, neutral networks, reinforcement learning, or the like.
  • model generation device 220 may, using the historical link quality information, train a data model that may be used by a set of network devices 230 to classify a set of links that actively support traffic flow. While the set of network devices 230 may utilize the data model to classify a set of links that actively support traffic flow, implementations described herein describe a data flow associated with a single network device 230 and a single link to illustrate the example process.
  • process 400 may include providing the data model to a network device (block 430 ).
  • model generation device 220 may provide the data model (e.g., that includes the prediction function) to network device 230 .
  • network device 230 may use the data model to classify a link that is actively supporting traffic flow.
  • process 400 may include classifying a link that is actively supporting traffic flow by using link quality information associated with the link as input for the data model (block 440 ).
  • network device 230 may receive the data model, and may determine link quality information associated with a link that is actively supporting traffic flow, and may use the link quality information as input for the data model.
  • the data model may output an overall link quality score that serves to classify the link into a class associated with high link quality, a class associated with marginal link quality, or a class associated with low link quality.
  • the data model may use a different classification scheme, such as a classification scheme with more classes, a classification scheme with less classes, a classification scheme focused on a different metric (e.g., a subset of link quality, such as particular link quality metric), or the like.
  • network device 230 may determine link quality information associated with a link that is actively supporting traffic flow. For example, network device 230 may use one or more techniques to determine link quality for the link and/or monitor environment conditions. In some cases, network device 230 may use different quality monitoring techniques to monitor different values included in the link quality information. As an example, network device 230 may monitor bits traveling through the link to determine a BER value. As another example, network device 230 may use one or more sensors to monitor and measure temperature values associated with the board, an ASIC, etc. In this way, network device 230 may determine link quality information for a link that is actively supporting traffic flow.
  • network device 230 may use one or more techniques to determine link quality for the link and/or monitor environment conditions. In some cases, network device 230 may use different quality monitoring techniques to monitor different values included in the link quality information. As an example, network device 230 may monitor bits traveling through the link to determine a BER value. As another example, network device 230 may use one or more sensors to monitor and measure temperature values associated with the board, an ASIC,
  • network device 230 may classify a link into a class associated with high link quality. For example, network device 230 may use, as input for the data model, link quality information that includes one or more values associated with high link quality (e.g., a low BER value, a low FEC value, etc.). In this case, the data model may output an overall link quality score associated with high link quality. In some implementations, network device 230 may classify a link into a class associated with marginal link quality or a low link quality in the same manner described above.
  • link quality information that includes one or more values associated with high link quality (e.g., a low BER value, a low FEC value, etc.).
  • the data model may output an overall link quality score associated with high link quality.
  • network device 230 may classify a link into a class associated with marginal link quality or a low link quality in the same manner described above.
  • network device 230 is able to classify the link into a class associated with a particular quality level, and may prevent link failure and traffic loss by disabling links with a marginal or low link quality.
  • process 400 may include disabling the link if the link is classified into a class associated with marginal link quality or into a class associated with low link quality (block 450 ).
  • network device 230 may classify the link into a class associated with marginal link quality or low link quality, may redirect traffic to avoid traffic flow via the link, and may disable the link.
  • network device 230 may determine an actual quality level of the link by executing an FEC technique, a CRC technique, or the like. For example, network device 230 may execute an FEC technique, a CRC technique, or the like, to determine whether an output of the data model is a false prediction (e.g., a false positive, a false neutral, a false negative). In this case, if network device 230 detects a false prediction, then network device 230 may update the data model, thereby improving accuracy of subsequent link predictions.
  • a false prediction e.g., a false positive, a false neutral, a false negative
  • network device 230 may disable the link. For example, assume the link is classified into a class associated with marginal link quality or low link quality. In this case, network device 230 may redirect traffic to avoid traffic flow via the link (e.g., by assigning one or more additional links to support traffic flow associated with the link classified as marginal link quality or low link quality). Additionally, network device 230 may disable the link to allow one or more actions to be executed to improve accuracy of link classification.
  • process 400 may include performing, after disabling the link, one or more actions associated with improving accuracy of link classification (block 460 ).
  • network device 230 may determine an actual quality of the link (e.g., by performing a diagnostic test, etc.), update the data model (e.g., by modifying quality coefficient values, threshold values, weight values, etc.), update the class of the link, and/or the like.
  • network device 230 may determine an actual quality level of the link by performing a diagnostic test. For example, network device 230 may perform a pseudorandom binary sequence (PRBS) test to determine an actual quality level of the link. In this case, network device 230 may perform a PRBS test to determine an actual quality level of the link (e.g., high quality, marginal quality, low quality, etc.), and may compare the actual quality level of the link and the link classification to determine whether the link is correctly classified.
  • PRBS pseudorandom binary sequence
  • the PRBS test may be able to determine an actual quality level of the link by determining how many bits are able to accurately travel through the link (e.g., whether a zero bit stays a zero bit from a first side of the link to a second side of the link, whether a one bit stays a one bit from a first side of the link to a second side of the link, and/or the like).
  • the PRBS test may use thresholds to determine an actual quality level of the link. If a first threshold amount of bits accurately travel through the link, then the PRBS test may determine that the link is associated with high link quality. If a second threshold amount of bits accurately travel through the link, then the PRBS test may determine that the link is associated with marginal link quality. If a third threshold amount of bits accurately travel through the link, then the PRBS test may determine that the link is associated with low link quality. In this way, the PRBS test may compare the actual quality of the link to the link classification to determine whether the link is correctly classified.
  • network device 230 may update the class of the link based on a result of the diagnostic test. For example, network device 230 may update the class of the link if the diagnostic test indicates that an actual quality of the link is different from a quality associated with the classification.
  • network device 230 may update the class of the link by updating the data model. For example, network device 230 may modify (e.g., increase, decrease, etc.) one or more of the quality coefficient values based on a result of the diagnostic rest. In this case, network device 230 may re-provide the link quality information as input to the data model, which may cause the data model to update the class of the link using the one or more modified quality coefficient values. In this way, network device 230 increases accuracy of subsequent predictions (e.g., relative to not updating the data model).
  • network device 230 classifies the link into a class associated with high link quality and that the link fails the diagnostic test.
  • network device 230 may update the class of the link to a class associated with marginal link quality or to a class associated with low link quality based on an output of the diagnostic test. Additionally, network device 230 may modify quality coefficient values that are associated with classifying the link, thereby allowing network device 230 to make more accurate subsequent link quality predictions. In this way, network device 230 may update the data model to eliminate false-positives (e.g., predictions that link quality is high when actual link quality is low).
  • network device 230 classifies the link into a class associated with marginal link quality and that the link does not receive a diagnostic test score associated with marginal link quality.
  • network device 230 may update the class of the link to a class associated with high link quality or to a class associated with low link quality. If the link passes the diagnostic test (e.g., receives a score associated with high link quality), then network device 230 may update the data model by modifying quality coefficient values that are associated with classifying the link. If the link fails the diagnostic test (e.g., receives a score associated with low link quality), then network device 230 may update the data model by modifying quality coefficient values that are associated with classifying the link. In this way, network device 230 may update the data model to eliminate false-neutrals (e.g., predictions that link quality is marginal when actual link quality is high or low).
  • network device 230 classifies the link into a class associated with low link quality and that the link passes the diagnostic test.
  • network device 230 may update the class of the link to a class associated with high link quality or to a class associated with marginal link quality. If the link passes the diagnostic test with a score associated with high link quality, then network device 230 may update the data model by modifying quality coefficient values that are associated with classifying the link. If the link passes the diagnostic test with a score associated with a marginal quality link, then network device 230 may update the data model by modifying (e.g., by a lower amount than an amount associated with updating the class of the link to a class associated with high link quality) quality coefficient values that are associated with classifying the link. In this way, network device 230 may update the data model to eliminate false-negatives (e.g., predictions that link quality is low when actual link quality is marginal or high).
  • network device 230 classifies the link and that the diagnostic test confirms that the classification is correct.
  • network device 230 may update the data model with information indicating that the quality coefficient values used are the correct values. In this way, network device 230 is able to classify subsequent links that have similar link quality information with a higher degree of confidence.
  • network device 230 may monitor reclassified links (i.e., links that have had the classification updated). For example, network device 230 may monitor reclassified links by receiving an increased frequency of link quality information (e.g., an increased frequency as compared to a frequency in which link quality information was received during block 440 ). Additionally, network device 230 may make an increased number of link quality predictions to verify whether the reclassification improved accuracy of link classification. In some implementations, network device 230 may monitor reclassified links over a time interval. In some cases, a duration of the interval may be associated with link quality (e.g., a longer interval for low quality links than for marginal quality links).
  • network device 230 determines that the link is a high risk link by classifying the link as having low link quality. In this case, network device 230 may monitor and test the link more frequently and more extensively than monitoring and testing associated with lower risk links.
  • network device 230 may iteratively determine whether the link is correctly classified and may iteratively update the class of the link until the link is correctly classified.
  • a data model that has been continuously updated by network device 230 may correctly classify link quality in all (or most) situations, thereby allowing network device 230 to perform one or more associated with improving link quality.
  • process 400 may include performing, after classifying the link, one or more actions associated with improving link quality (block 470 ).
  • network device 230 may adapt to environmental conditions associated with the link (e.g., a temperature level, a voltage level, etc.), provide a report and/or a recommendation to an interested party (e.g., a network administrator, a technician, etc.), and/or the like.
  • network device 230 may adapt to environmental conditions associated with the link. For example, network device 230 may identify false predictions (e.g., false positives, false neutrals, false negatives, etc.), and may adapt to environmental conditions by retraining the data model after identifying the false predictions. In this way, network device 230 may retrain the data model to adapt to environmental conditions of a particular network device.
  • false predictions e.g., false positives, false neutrals, false negatives, etc.
  • network device 230 may implement a forecasting technique with a separate data model. For example, network device 230 may train a separate data model that includes a set of characteristics associated with link faults (e.g., including data leading up to link failure). In this case, network device 230 may provide link quality information associated with an active link as input for the separate data model, which may cause the separate data model to output a projected time period at which the link is predicted to degrade in link quality past a particular link quality threshold.
  • link quality information associated with an active link as input for the separate data model, which may cause the separate data model to output a projected time period at which the link is predicted to degrade in link quality past a particular link quality threshold.
  • network device 230 may blacklist the link, thereby preventing the link from actively supporting traffic flow. If the link does not have a history of being classified into the class associated with low link quality, then network device 230 may perform a diagnostic test to determine an actual quality of the link, as described above. In this way, network device 230 is able to reduce link faults (e.g., packet loss, signal attenuation, etc.) within network device 230 until the link and/or the board is repaired and/or replaced.
  • link faults e.g., packet loss, signal attenuation, etc.
  • network device 230 may provide, to network management device 240 , a recommendation to repair or replace one or more hardware components and/or software modules associated with the link.
  • network device 230 may generate a recommendation that includes link quality information for the link, results of a diagnostic test, and/or instructions indicating a particular action to be performed (e.g., repair the link, repair the board, replace the link, replace the board, etc.).
  • network device 230 may provide the recommendation to network management device 240 .
  • network device 230 may automatically schedule an appointment to have an interested party (e.g., a technician) repair a hardware component and/or a software module associated with the link.
  • an interested party e.g., a technician
  • network device 230 may perform one or more actions associated with improving link quality. Additionally, network device 230 predicting link faults prior to the occurrence of the faults prevents traffic loss. Furthermore, by performing one or more actions associated with improving accuracy of link classification and/or link quality, network device 230 improves efficiency and reliability of network communications.
  • process 400 may include additional blocks, fewer blocks, different blocks, or differently arranged blocks than those depicted in FIG. 4 . Additionally, or alternatively, two or more of the blocks of process 400 may be performed in parallel.
  • FIGS. 5A-5C are diagrams of one or more example implementations relating to the example process shown in FIG. 4 .
  • FIG. 5A shows an example implementation where network device 230 may identify a false positive, and my update the data model to improve accuracy of link classification.
  • FIG. 5B shows an example implementation where network device 230 may identify a false neutral, and may update the data model to improve accuracy of link classification.
  • FIG. 5C shows an example implementation where network device 230 may identify a false negative, and may update the data model to improve accuracy of link classification. In this way, network device 230 may continue to improve link classification until the data model is able to correctly classify all (or close to all) of the links, thereby ensuring that actions may be taken to improve link quality.
  • network device 230 may identify false positives (e.g., the data model may falsely classify a link into a class associated with high link quality when actual link quality is marginal or low). As shown by reference number 502 , network device 230 may classify the link into a class associated with high link quality. In this case, network device 230 may determine whether actual link errors (e.g., hardware errors) are associated with the link by executing a FEC technique, a CRC technique, or the like.
  • actual link errors e.g., hardware errors
  • network device 230 may idle and/or monitor the link.
  • the FEC technique, the CRC technique, or the like serve to verify that link is correctly classified as having high link quality.
  • network device 230 may update the data model.
  • network device 230 may update the data model by modifying one or more quality coefficient values.
  • modifying the one or more quality coefficient values associated with the link quality information may allow the data model to correctly classify links with similar link quality information when the data model is used for subsequent classifications.
  • network device 230 is able to identify false positives, and may update the data model to improve the accuracy of link classification.
  • network device 230 may identify false neutrals (e.g., the data model may falsely classify a link into a class associated with marginal link quality when actual link quality is high or low). As shown by reference number 508 , network device 230 may classify a link into a class associated with marginal link quality. In this case, and as shown by reference number 510 , network device 230 may disable, monitor, and test the link, in the same manner described above. As shown by reference number 512 , if the link passes the test with a high passing score (shown as a “high pass”), then network device 230 may update the class of the link to a class associated with high link quality. In this case, network device 230 may re-enable the link to allow the link to continue supporting network traffic flow.
  • a high passing score shown as a “high pass”
  • network device 230 may update the data model. For example, network device 230 may update the data model by modifying one or more quality coefficient values. In this case, modifying the one or more quality coefficient values associated with the link quality information may allow the data model to correctly classify links with similar link quality information when the data model is used for subsequent classifications.
  • network device 230 may update the class of the link to a class associated with low link quality. In this case, and as shown by reference number 518 , network device 230 may update the data model. For example, network device 230 may update the data model by modifying one or more quality coefficient values. This may allow the data model to correctly classify links with similar link quality information when the data model is used for subsequent classifications.
  • network device 230 may update the class of the link to a class associated with marginal link quality. In this case, network device 230 may update the data model by updating one or more values associated with confidence scores, as described above.
  • network device 230 is able to identify false neutrals, and may update the data model to improve the accuracy of link classification.
  • network device 230 may identify false negatives (e.g., the data model may falsely classify a link into a class associated with low link quality when actual link quality is marginal or high). As shown by reference number 522 , network device 230 may classify a link into a class associated with low link quality.
  • network device 230 may determine whether the link has received a threshold number of classifications associated with low link quality. In this case, network device 230 may compare a current number of times the link is classified into a class associated with low link quality and a threshold number of classifications associated with low link quality. As shown by reference number 526 , network device 230 may blacklist the link, as described elsewhere herein. In this way, network device 230 is able to prevent links that are repeatedly classified as low quality from supporting traffic flow.
  • network device 230 may disable, monitor, and test the link (e.g., using a diagnostic test), as described elsewhere herein.
  • network device 230 may update the class of the link to a class associated with high link quality. In this case, and as shown by reference number 532 , network device 230 may update the data model by modifying one or more quality coefficient values. This may allow the data model to correctly classify a subsequent link with similar link quality information.
  • network device 230 may perform one or more actions associated with repairing the link. For example, network device 230 may provide a request to a network management device to repair the link, repair a board associated with the link, replace the board, and/or the like.
  • FIGS. 5A-5C are provided merely as examples. Other examples are possible and may differ from what was described with regard to FIGS. 5A-5C .
  • network device 230 is able to identify false positives, false neutrals, and/or false negatives, and may update the data model to improve the accuracy of link classification.
  • network device 230 is able to identify situations where a link is beginning to degrade in quality, and is able to take corrective actions needed to repair the link, repair the board, replace the link, or replace the board, as described elsewhere herein. In this way, network device 230 improves network performance relative to taking corrective action after a link fails.
  • the term component is intended to be broadly construed as hardware, firmware, and/or a combination of hardware and software.
  • satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.

Abstract

A device may receive a trained data model that has been trained using historical link quality information associated with a set of links. The device may determine, after receiving the trained data model, link quality information associated with a link that is actively supporting traffic. The device may classify the link by using the link quality information as input for the data model. The data model may classify the link into a class of a set of classes associated with measuring link quality. The device may determine an actual quality level of the link. The device may selectively update the class of the link after determining the actual link quality of the link. The device may perform one or more actions associated with improving link quality based on classifying the link and/or selectively updating the class of the link.

Description

RELATED APPLICATION
This application is a continuation of U.S. patent application Ser. No. 15/666,015, filed Aug. 1, 2017 (now U.S. Pat. No. 10,298,465), the disclosure of which is incorporated herein by reference.
BACKGROUND
A network device may include internal links and external links. For example, a network device may include internal links that allow traffic flow (e.g., packets) between components of the network device and/or external links that allow traffic flow between network devices.
SUMMARY
According to some possible implementations, a device may include one or more processors to receive a trained data model. The data model may be trained with historical link quality information associated with a set of links. The data model may include one or more values associated with measuring link quality. The one or more processors may determine, after receiving the trained data model, link quality information associated with a link that is actively supporting traffic flow. The one or more processors may classify the link by using the link quality information as input for the data model. The data model may classify the link into a first class associated with a first measure of link quality, a second class associated with a second measure of link quality, or a third class associated with a third measure of link quality. The one or more processors may determine whether the link is correctly classified by updating the data model with information associated with improving accuracy of classifying the link. The one or more processors may update a class of the link to the first class, the second class, or the third class after determining whether the link is correctly classified. The one or more processors may perform one or more actions associated with improving link quality based on classifying the link and/or updating a class of the link.
According to some possible implementations, a non-transitory computer-readable medium may store one or more instructions that, when executed by one or more processors, cause the one or more processors to obtain a data model that is trained using historical link quality information associated with a set of links. The historical link quality information may include one or more values associated with measuring link quality. The one or more instructions may cause the one or more processors to determine, after obtaining the data model, link quality information associated with a link that is actively supporting traffic flow. The one or more instructions may cause the one or more processors to classify the link by using the link quality information as input for the data model. The data model may classify the link into a class of a set of classes associated with measuring link quality. The one or more instructions may cause the one or more processors to determine whether the link is correctly classified by performing one or more actions associated with improving accuracy of classifying the link. The one or more instructions may cause the one or more processors to selectively update the class of the link after determining whether the link is correctly classified. The one or more instructions may cause the one or more processors to perform one or more actions associated with improving link quality based on classifying the link and/or selectively updating the class of the link.
According to some possible implementations, a method may include receiving, by a device, a trained data model. The trained data model may be trained using historical link quality information associated with a set of links. The method may include determining, by the device and after receiving the trained data model, link quality information associated with a link that is actively supporting traffic. The method may include classifying the link, by the device, by using the link quality information as input for the data model. The data model may classify the link into a class of a set of classes associated with measuring link quality. The method may include determining, by the device, an actual quality level of the link. The method may include selectively updating the class of the link, by the device, after determining the actual link quality of the link. The method may include performing, by the device, one or more actions associated with improving link quality based on classifying the link and/or selectively updating the class of the link.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1A-1D are diagrams of an overview of an example implementation described herein;
FIG. 2 is a diagram of an example environment in which systems and/or methods, described herein, may be implemented;
FIG. 3 is a diagram of example components of one or more devices of FIG. 2;
FIG. 4 is a flow chart of an example process for using machine learning to monitor link quality and detect link faults; and
FIGS. 5A-5C are diagrams of one or more example implementations relating to the example process shown in FIG. 4.
DETAILED DESCRIPTION
The following detailed description of example implementations refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.
As demand for data services increases, the amount of traffic that a network device handles may increase. This may lead to higher bandwidth interconnects within a board (e.g., a printed circuit board (PCB)) located within the network device, and may further cause links (e.g., Serializer-Deserializer (SerDes) links) to be run at higher speeds. Furthermore, the board may be densely packed with internal links, causing interference and leading to link degradation over the life of the board. However, many network devices are only able to detect link faults (e.g., link degradation) after the faults occur.
Some implementations described herein provide a network device to use one or more machine learning techniques to monitor link quality and to detect link faults prior to the link incurring the fault. For example, a model generation device may use historical link quality information associated with a set of links to train a data model (e.g., by creating a prediction function), and may provide the trained data model (e.g., the prediction function) to a set of network devices. In this case, the set of network devices may classify links that actively support traffic flow by using link quality information associated with the links as input for the data model. Furthermore, the network device may perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality (e.g., request to repair a link and/or a board, request to replace a board, adapt to environmental conditions associated with a link, etc.).
In this way, the set of network devices predict link faults and, by taking one or more pre-emptive actions, eliminate traffic loss via links associated with the set of network devices. Furthermore, by performing one or more actions associated with improving link quality, the set of network devices improve efficiency and reliability of network communications.
FIGS. 1A-1D are diagrams of an overview of an example implementation 100 described herein. As shown in FIGS. 1A-1D, example implementation 100 may include a model generation device that uses one or more machine learning techniques to train a data model that may be used to monitor link quality and detect link faults. Additionally, as described in detail further herein, a set of network devices may use the trained data model to perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality.
As shown in FIG. 1A, and by reference number 105, the model generation device may receive, from a data source, historical link quality information. The historical link quality information may include one or more values associated with measuring link quality. In this case, the historical performance information may include one or more actual measures of link quality, one or more predictors of link quality, and/or one or more environment conditions.
For example, the one or more actual measures of link quality may include one or more values associated with identifying errors in data transmission, such as a forward error correction (FEC) value, a cyclic redundancy check (CRC) value, one or more values measuring signal integrity, such as a signal-to-noise (SNR) ratio value, and/or the like. Additionally, the one or more predictors of link quality may include a bit error rate (BER) value, a link eye width value, a link eye height value, a link quality slope, and/or the like. Furthermore, the one or more environment conditions may include a temperature value (e.g., a temperature of the board, a temperature of the system, etc.), a link uptime value, and/or the like.
Furthermore, the historical link quality information may be associated with a set of network devices that operate under different environmental conditions, thereby allowing the model generation device to train a data model that accounts for a number of different environmental conditions.
As shown by reference number 110, the model generation device may train a data model. For example, the model generation device may train the data model by creating a prediction function that is able to classify links that are actively supporting traffic flow, as described further herein. In this case, the network device may create the prediction function by associating values included in the historical link quality information with quality coefficient values (e.g., values indicating a particular link quality measurement). Additionally, the network device may configure one or more weight values that may be used in determining an overall link quality score.
As an example, assume the historical link quality information includes five values (e.g., value A, value B, value C, value D, and value E). Further assume that value A has a positive influence on link quality (e.g., a high A value is a strong indicator of high link quality), that values B-D do not have a high influence on link quality, and that value E has a negative influence on link quality (e.g., a high E value is a strong indicator of low link quality). In this case, a prediction function may account for the varied degrees of influence by assigning values a polarity (e.g., positive, negative) and a weight (e.g., Link Quality=0.9A+0.1B+0.2C+0.05D−0.9E).
As shown by reference number 115, the model generation device may provide the data model to a set of network devices (shown as network device 1 through network device N). For example, the model generation device may provide the prediction function to the set of network devices to serve as an initial data model that each network device may implement when monitoring traffic flow.
In this way, the set of network devices may receive a trained data model that may be used to classify links that actively support traffic flow.
While FIGS. 1B-1D show a first network device (shown as network device 1) performing one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality, in practice, the set of network devices (network device 1 through network device N) each perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality.
As shown in FIG. 1B, and by reference number 120, the first network device may analyze traffic associated with a link to determine link quality information. For example, traffic may be passing through a link, and the first network device may use one or more performance monitoring techniques to determine link quality information for the link.
As an example, the first network device may determine a BER value by comparing bit string values before and after the bit string travels through the link. As another example, the first network device may use one or more sensors to measure temperature values associated with a board. As another example, the first network device may monitor traffic to determine a noise value and a signal value, and may process the values to determine an SNR value. In this way, the first network device may determine link quality information for a link that is actively supporting traffic flow.
As shown by reference number 125, the first network device may classify the link using the data model. For example, the first network device may classify the link by using the link quality information as input to the data model, which may cause the data model to output an overall link quality score. In some cases, the overall link quality score may be associated with one or more link quality classes, such as a class associated with high link quality, a class associated with marginal link quality, or a class associated with low link quality. In the example shown, the data model may classify the link into a class associated with marginal link quality.
By using the data model to classify the link, the first network device may perform one or more actions associated with improving accuracy of link classification and/or one or more actions associated with improving link quality, as described further herein.
As shown in FIG. 1C, and by reference number 130, the first network device may determine an actual quality level of the link. For example, the first network device may determine an actual quality level of the link while the link is active and/or may determine an actual quality level of the link after disabling the link. As an example, the first network device may determine an actual quality level of the link while the link is active by executing non-intrusive techniques such as an FEC technique, a CRC technique, or the like. As another example, the first network device may disable the link after traffic has been redirected to another link (so as to prevent traffic drop on the link), and may determine an actual quality level of the link by performing a diagnostic test on the link. In some cases, the first network device may use a pseudorandom binary sequence (PRBS) test to monitor bits that travel through the link. In this case, the link may pass or fail the PRBS test based on a number of bits that successfully travel through the link. As shown as an example, assume that a threshold amount of bits change value (e.g., from 0 to 1 or from 1 to 0) as the bits travel through the link. This may cause the link to fail the diagnostic test, indicating that the link is a low quality link.
Additionally, the first network device may determine whether the link is correctly classified. For example, the first network device may compare the classification of the link and the actual quality level of the link (e.g., as indicated by the diagnostic test). If the classification of the link and the actual quality level of the link does not match (i.e., the link is incorrectly classified), then the first network device may update the data model by modifying one or more quality coefficient values to improve subsequent classifications of the link. For example, the first network device may receive a classification as a marginal quality link (e.g., as a result of training data), but may have an actual link classification of low link quality due to environment conditions that are specific to the first network device. In this way, the first network device may update the data model to dynamically adapt to the environment conditions that are specific to the first network device.
As shown by reference number 135, the first network device may update the class of the link. For example, the first network device may update the class of the link based on a result of the diagnostic test. In this case, the first network device may update the class of the link to a different class than the previous classification. For example, the first network device may update the class of the link to a class associated with high link quality if the link passes the diagnostic test, and may update the class of the link to a class associated with low link quality if the link fails the diagnostic test. Alternatively, the first network device may update the class of the link by re-using the link quality information as input for the data model. In this case, the data model may output a more accurate link quality prediction as a result of modifying the one or more quality coefficient values associated with the data model. In the example shown, the first network device may update the class of the link to a class associated with low link quality.
By determining an actual quality level of the link and updating the class of the link, the first network device is able to improve accuracy of link classification. Furthermore, if the link is classified or reclassified as a low quality link, the first network device may disable the link to prevent subsequent traffic from suffering from packet loss.
As shown in FIG. 1D, and by reference number 140, the first network device may determine whether the link has received a threshold number of classifications associated with low link quality. For example, the first network device may monitor the link over an interval, and may periodically classify the link throughout the interval. In this case, if the first network device classifies the link as a low quality link, then the first network device may compare a current number of times the link is classified as a low quality link to a threshold number of classifications associated with low link quality. In this way, the first network device is able to identify whether to blacklist the link (i.e., prevent the link from being able to actively support traffic flow) or to monitor and test the link.
As shown by reference number 145, if the current number of times the link is classified as a low quality link does not satisfy the threshold number of classifications associated with low link quality, then the first network device may monitor and test the link. For example, the first network device may continue to monitor the link, perform an additional diagnostic test on the link, execute additional FED techniques and/or update the data model to improve accuracy of link classification, and/or the like. In this way, the first network device is able to verify whether the link is a low quality link before blacklisting the link and/or performing additional actions associated with improving link quality.
As shown by reference number 150, if the current number of times the link is classified as a low quality link satisfies the threshold number of classifications associated with low link quality, then the first network device may blacklist the link. In this case, the first network device may blacklist the link to prevent the link from being able to actively support traffic flow. By blacklisting the link, the first network device avoids traffic drops and link faults that may result if the low quality link continues supporting traffic flow, thereby improving quality of network communications.
As shown by reference number 155, the first network device may generate a recommendation. For example, the first network device may generate a recommendation to repair the link (e.g., an internal link, an external link) and/or the board, replace the link (e.g., an external link) and/or the board, or the like. In this case, the recommendation may include the link quality information associated with the link, results of the diagnostic test, and/or instructions indicating a particular action to be performed (e.g., repair the link, repair the board, replace the link, replace the board, etc.).
As shown by reference number 160, the first network device may provide the recommendation to a network management device associated with an interested party (e.g., a technician). In this way, the interested party may repair and/or replace the link and/or the board, thereby improving network communications relative to allowing a low quality link to continue to support traffic flow.
In some implementations, the first network device may generate and provide a recommendation for a link that is classified or reclassified into a class associated with marginal link quality. In this way, the interested party may repair and/or replace the link and/or the board prior to packet drop and/or signal loss.
As shown by reference number 165, the first network device may continue tuning the data model. For example, the first network device may continue to monitor traffic flow and modify the data model to improve accuracy of link classification, adapt to environmental conditions associated with the link, use a forecasting technique to identify a link fault prior to the fault occurring, and/or the like.
In this way, the first network device is able to manage link quality and predict link faults prior to the occurrence of the links faults.
As indicated above, FIGS. 1A-1D are provided merely as an example. Other examples are possible and may differ from what was described with regard to FIGS. 1A-1D.
FIG. 2 is a diagram of an example environment 200 in which systems and/or methods, described herein, may be implemented. As shown in FIG. 2, environment 200 may include one or more data sources 210, a model generation device 220, one or more network devices 230-1 through 230-N (N≥1) (hereinafter referred to collectively as “network devices 230”, and individually as “network device 230”), a network management device 240, and/or a network 250. Devices of environment 200 may interconnect via wired connections, wireless connections, or a combination of wired and wireless connections.
Data source 210 includes one or more devices capable of receiving, storing, and/or providing historical link quality information. For example, data source 210 may include a server device or a group of server devices. In some implementations, data source 210 may interact with a set of network devices 230 (or other devices that may monitor link quality) to receive link quality information and store the link quality information as historical link quality information. In some implementations, data source 210 may store historical link quality information associated with a set of links. In some implementations, data source 210 may provide the historical link quality information to model generation device 220 and/or network device 230.
Model generation device 220 includes one or more devices capable of receiving, storing, processing, and/or providing information associated with link quality. For example, model generation device 220 may include a server device or a group of server devices. In some implementations, model generation device 220 may be implemented as a cloud platform. Alternatively, model generation device 220 may be implemented as a server device (e.g., an on-site server device). In some implementations, model generation device 220 may receive historical link quality information from data source 210. In some implementations, model generation device 220 may process the historical link quality information to train a data model (e.g., by creating a prediction function). In some implementations, model generation device 220 may provide a trained data model (e.g., the prediction function) to network device 230.
Network device 230 includes one or more devices capable of receiving, processing, forwarding, and/or transferring information associated with a link. For example, network device 230 may include a router, such as a label switching router (LSR), a label edge router (LER), an ingress router, an egress router, a provider router (e.g., a provider edge router, a provider core router, etc.), a virtual router, or the like. Additionally, or alternatively, network device 230 may include a gateway, a switch, a firewall, a hub, a bridge, a reverse proxy, a server (e.g., a proxy server, a cloud server, a data center server, etc.), a load balancer, or a similar device. In some implementations, network device 230 may be a physical device implemented within a housing, such as a chassis. In some implementations, network device 230 may be a virtual device implemented by one or more computer devices of a cloud computing environment or a data center.
In some implementations, network device 230 may include or connect to a set of links, and the set of links may be monitored (e.g., using a hardware component, such as a sensor or a tap, using a software module, etc.) to identify link quality information associated with the set of links. In some implementations, network device 230 may use a data model (e.g., a prediction function) to classify the link, and may perform one or more actions associated with improving accuracy of link classification (e.g., as shown in FIG. 2 by feedback loops, network devices 230 may tune the data model) and/or one or more actions associated with improving link quality. In some implementations, network device 230 may provide link quality information to data source 210 (e.g., which may be stored as historical link quality information). In some implementations, network device 230 may provide a recommendation to repair a link to network management device 240.
Network management device 240 includes one or more devices capable of receiving, storing, processing, and/or providing information associated with a link. For example, network management device 240 may include a computing device, such as a desktop computer, a laptop computer, a tablet computer, a server device, a mobile phone (e.g., a smart phone or a radiotelephone), a wearable computer (e.g., a smart watch, a smart band, a smart pair of eyeglasses, etc.), a sensor device, or a similar type of device. In some implementations, network management device 240 may receive, from network device 230, a recommendation to repair a link, to replace a board associated with the link, or the like. In this case, network management device 240 may schedule an appointment to repair the link or to replace a board associated with the link, and/or may provide a response message to network device 230 indicating that the appointment has been scheduled.
Network 250 includes one or more wired and/or wireless networks. For example, network 250 may include a cellular network (e.g., a fifth generation (5G) network, a fourth generation (4G) network, such as a long-term evolution (LTE) network, a third generation (3G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, or the like, and/or a combination of these or other types of networks.
The number and arrangement of devices and networks shown in FIG. 2 are provided as an example. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or differently arranged devices and/or networks than those shown in FIG. 2. Furthermore, two or more devices shown in FIG. 2 may be implemented within a single device, or a single device shown in FIG. 2 may be implemented as multiple, distributed devices. Additionally, or alternatively, a set of devices (e.g., one or more devices) of environment 200 may perform one or more functions described as being performed by another set of devices of environment 200.
FIG. 3 is a diagram of example components of a device 300. Device 300 may correspond to data source 210, model generation device 220, network device 230, and/or network management device 240. In some implementations, data source 210, model generation device 220, network device 230, and/or network management device 240 may include one or more devices 300 and/or one or more components of device 300. As shown in FIG. 3, device 300 may include a switching fabric 310, one or more line cards 320, one or more links (e.g., a set of external links 330, a set of inter-board links 340, a set of intra-board (onboard) links 342, etc.), and/or a controller 350. In some implementations, traffic between switching fabric 310 and controller 350 may be provided and/or received through a single internal link 340. In some implementations, traffic between switching fabric 310 and controller 350 may be provided and/or received through a set of inter-board links 340, where each internal link 340 may be designated for a subset of external links 330 and/or a subset of line cards 320. In some implementations, line card 320 may use a set of inter-board links 340 to communicate with one or more corresponding planes of a switching fabric 310.
Switching fabric 310 interconnects external links 330 via line cards 320. In some implementations, switching fabric 310 may be implemented using one or more switching fabric components 312 (e.g., one or more crossbars, one or more busses, one or more shared memories, and/or one or more planes). In some implementations, switching fabric components 312 may be connected using intra-board (onboard) links 342. In some implementations, switching fabric 310 may enable external links 330 to communicate. For example, switching fabric 310 may connect with one or more line cards 320 via a set of inter-board links 340, and the one or more line cards 320 may connect with the external links 330, as described further herein.
Line cards 320 include one or more line card components 322. For example, line cards 320 may include a modular electronic circuit designed to fit on a PCB, and may include one or more line card component 322 (e.g., a packet processing component, a re-timer, etc.). A packet processing component may include one or more processors to process packets, and may process incoming traffic, such as by performing data link layer encapsulation or decapsulation. In some implementations, a packet processing component may receive a packet from switching fabric 310, may process the packet, and may output the processed packet to an appropriate external link 330 connected to line card component 320. Additionally, or alternatively, a packet processing component may receive a packet from an external link 330, may process the packet, and may output the processed packet to switching fabric 310 for transfer to controller 350 and/or to another external link 330 (e.g., via the same packet processing component or a different packet processing component).
External link 330 is a point of attachment for physical links, and may be a point of ingress and/or egress for incoming and/or outgoing traffic, such as packets. In some implementations, a single line card 320 may be connected to multiple external links 330. In some implementations, a single line card 320 may be connected to a single external link 330. An external link 330 may permit communication between a first network device 230 and a second network device 230 that is a neighbor of the first network device 230. External link 330 may store packets (e.g., in a buffer) and/or may schedule packets for transmission on output physical links. External link 330 may support data link layer encapsulation or decapsulation and/or a variety of higher-level protocols.
Inter-board link 340 is a path that allows line card 320 and/or controller 350 to communicate with switching fabric 310. Inter-board link 340 may include, for example, a wired or wireless path, such as a fiber-optic path, an electrical path, or the like. In some implementations, there may be multiple inter-board links 340 between a single line card component 320 and switching fabric 310. In some implementations, there may be a single inter-board link 340 between controller 350 and switching fabric 310. Intra-board (onboard) link 342 is a path that allows interconnection between line card components 322 and/or switching fabric components 312.
Controller 350 includes a processor in the form of, for example, a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), and/or another type of processor. The processor is implemented in hardware, firmware, or a combination of hardware and software. In some implementations, controller 350 may include one or more processors that may be programmed to perform a function. In some implementations, controller 350 may include a group of virtual devices that each includes one or more processors.
In some implementations, controller 350 may include a random access memory (RAM), a read only memory (ROM), and/or another type of dynamic or static storage device (e.g., a flash memory, a magnetic memory, an optical memory, etc.) that stores information and/or instructions for use by controller 350.
In some implementations, controller 350 may communicate with other devices, networks, and/or systems connected to device 300 to exchange information regarding network topology. Controller 350 may create routing tables based on the network topology information, create forwarding tables based on the routing tables, and forward the forwarding tables to line card component 320, such as for use in performing route lookups for incoming and/or outgoing packets.
Controller 350 may perform one or more processes described herein. Controller 350 may perform these processes in response to executing software instructions stored by a non-transitory computer-readable medium. A computer-readable medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
Software instructions may be read into a memory and/or a storage component associated with controller 350 from another computer-readable medium or from another device via a communication interface. When executed, software instructions stored in a memory and/or storage component associated with controller 350 may cause controller 350 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, implementations described herein are not limited to any specific combination of hardware circuitry and software.
The number and arrangement of components shown in FIG. 3 are provided as an example. In practice, device 300 may include additional components, fewer components, different components, or differently arranged components than those shown in FIG. 3. Additionally, or alternatively, a set of components (e.g., one or more components) of device 300 may perform one or more functions described as being performed by another set of components of device 300.
FIG. 4 is a flow chart of an example process 400 for using machine learning to monitor link quality and detect link faults. In some implementations, one or more process blocks of FIG. 4 may be performed by network device 230. In some implementations, one or more process blocks of FIG. 4 may be performed by another device or a group of devices separate from or including network device 230, such as data source 210, model generation device 220, and/or network management device 240.
As shown in FIG. 4, process 400 may include receiving historical link quality information associated with a set of links (block 410). For example, model generation device 220 may receive, from data source 210, historical link quality information associated with a set of links (e.g., a set of internal links within a network device, a set of external links between devices, etc.). A link may be an internal link that provides a connection between components and/or modules of a device (e.g., hardware components, software modules, etc.) or may be an external link that provides a connection between devices.
In some implementations, data source 210 may store historical link quality information. For example, data source 210 may store historical link quality information using a data structure, such as an array, a linked-list, a tree, a graph, a hash table, a database, and/or the like. In some implementations, data source 210 may store large quantities of data. For example, data source 210 may store thousands, millions, billions, or even trillions of data points.
In some implementations, model generation device 220 may receive historical link quality information. Historical link quality information may include one or more actual measures of link quality, one or more predictors of link quality, and/or one or more environment conditions. For example, the one or more actual measures of link quality may include one or more values associated with identifying errors in data transmission, such as a forward error correction (FEC) value, a cyclic redundancy check (CRC) value, one or more values measuring signal integrity, such as a signal-to-noise (SNR) ratio value, and/or the like. Additionally, the one or more predictors of link quality may include a bit error rate (BER) value, a link eye width value, a link eye height value, a link quality slope, and/or the like. Furthermore, the one or more environment conditions may include a temperature value (e.g., a temperature of the board, a temperature of the system, etc.), a link uptime value, and/or the like.
In some implementations, model generation device 220 may receive historical link quality information for a set of network devices 230. For example, model generation device 220 may receive historical link quality information for a set of network devices 230 that operate under different environment conditions. By receiving historical link quality information for devices operating under different environment conditions, the model generation device 220 may process the historical link quality information to train a data model, as described further herein.
In this way, model generation device 220 may receive historical link quality information associated with a set of links, such that the information may be used to train a data model.
As further shown in FIG. 4, process 400 may include training a data model using the historical link quality information (block 420). For example, model generation device 220 may, using a machine learning technique, train a data model that may be used to classify a link, as described further herein.
In some implementations, model generation device 220 may train a data model for use by a set of network devices 230. For example, model generation device 220 may train a data model by creating a prediction function, and may provide the prediction function to the set of network devices 230. In this case, model generation device 220 may create the prediction function by associating values included in the historical link quality information with dynamic quality coefficient values (e.g., which may be positive or negative). A quality coefficient value may be a value indicating a particular link quality measurement. In some cases, the quality coefficient values may be grouped into classes associated with high link quality, marginal link quality, or low link quality.
Additionally, model generation device 220 may create the prediction function by configuring one or more weight values that may be used in determining an overall link quality score. For example, model generation device 220 may assign weights to particular quality coefficient values. In this way, model generation device 220 may use the weighted quality coefficient values to determine an overall link quality score.
In some implementations, as described herein, model generation device 220 may train a data model using a supervised machine learning technique. Additionally, or alternatively, model generation device 220 may train a data model using a different type of machine learning technique, such as machine learning via clustering, dimensionality reduction, structured prediction, anomaly detection, neutral networks, reinforcement learning, or the like.
In this way, model generation device 220 may, using the historical link quality information, train a data model that may be used by a set of network devices 230 to classify a set of links that actively support traffic flow. While the set of network devices 230 may utilize the data model to classify a set of links that actively support traffic flow, implementations described herein describe a data flow associated with a single network device 230 and a single link to illustrate the example process.
As further shown in FIG. 4, process 400 may include providing the data model to a network device (block 430). For example, model generation device 220 may provide the data model (e.g., that includes the prediction function) to network device 230. In this way, network device 230 may use the data model to classify a link that is actively supporting traffic flow.
As further shown in FIG. 4, process 400 may include classifying a link that is actively supporting traffic flow by using link quality information associated with the link as input for the data model (block 440). For example, network device 230 may receive the data model, and may determine link quality information associated with a link that is actively supporting traffic flow, and may use the link quality information as input for the data model. In this case, the data model may output an overall link quality score that serves to classify the link into a class associated with high link quality, a class associated with marginal link quality, or a class associated with low link quality. In other cases, the data model may use a different classification scheme, such as a classification scheme with more classes, a classification scheme with less classes, a classification scheme focused on a different metric (e.g., a subset of link quality, such as particular link quality metric), or the like.
In some implementations, network device 230 may determine link quality information associated with a link that is actively supporting traffic flow. For example, network device 230 may use one or more techniques to determine link quality for the link and/or monitor environment conditions. In some cases, network device 230 may use different quality monitoring techniques to monitor different values included in the link quality information. As an example, network device 230 may monitor bits traveling through the link to determine a BER value. As another example, network device 230 may use one or more sensors to monitor and measure temperature values associated with the board, an ASIC, etc. In this way, network device 230 may determine link quality information for a link that is actively supporting traffic flow.
In some implementations, network device 230 may classify a link into a class associated with high link quality. For example, network device 230 may use, as input for the data model, link quality information that includes one or more values associated with high link quality (e.g., a low BER value, a low FEC value, etc.). In this case, the data model may output an overall link quality score associated with high link quality. In some implementations, network device 230 may classify a link into a class associated with marginal link quality or a low link quality in the same manner described above.
In this way, network device 230 is able to classify the link into a class associated with a particular quality level, and may prevent link failure and traffic loss by disabling links with a marginal or low link quality.
As further shown in FIG. 4, process 400 may include disabling the link if the link is classified into a class associated with marginal link quality or into a class associated with low link quality (block 450). For example, network device 230 may classify the link into a class associated with marginal link quality or low link quality, may redirect traffic to avoid traffic flow via the link, and may disable the link.
In some implementations, prior to disabling the link, network device 230 may determine an actual quality level of the link by executing an FEC technique, a CRC technique, or the like. For example, network device 230 may execute an FEC technique, a CRC technique, or the like, to determine whether an output of the data model is a false prediction (e.g., a false positive, a false neutral, a false negative). In this case, if network device 230 detects a false prediction, then network device 230 may update the data model, thereby improving accuracy of subsequent link predictions.
In some implementations, network device 230 may disable the link. For example, assume the link is classified into a class associated with marginal link quality or low link quality. In this case, network device 230 may redirect traffic to avoid traffic flow via the link (e.g., by assigning one or more additional links to support traffic flow associated with the link classified as marginal link quality or low link quality). Additionally, network device 230 may disable the link to allow one or more actions to be executed to improve accuracy of link classification.
As further shown in FIG. 4, process 400 may include performing, after disabling the link, one or more actions associated with improving accuracy of link classification (block 460). For example, network device 230 may determine an actual quality of the link (e.g., by performing a diagnostic test, etc.), update the data model (e.g., by modifying quality coefficient values, threshold values, weight values, etc.), update the class of the link, and/or the like.
In some implementations, network device 230 may determine an actual quality level of the link by performing a diagnostic test. For example, network device 230 may perform a pseudorandom binary sequence (PRBS) test to determine an actual quality level of the link. In this case, network device 230 may perform a PRBS test to determine an actual quality level of the link (e.g., high quality, marginal quality, low quality, etc.), and may compare the actual quality level of the link and the link classification to determine whether the link is correctly classified.
In some cases, the PRBS test may be able to determine an actual quality level of the link by determining how many bits are able to accurately travel through the link (e.g., whether a zero bit stays a zero bit from a first side of the link to a second side of the link, whether a one bit stays a one bit from a first side of the link to a second side of the link, and/or the like). In this case, the PRBS test may use thresholds to determine an actual quality level of the link. If a first threshold amount of bits accurately travel through the link, then the PRBS test may determine that the link is associated with high link quality. If a second threshold amount of bits accurately travel through the link, then the PRBS test may determine that the link is associated with marginal link quality. If a third threshold amount of bits accurately travel through the link, then the PRBS test may determine that the link is associated with low link quality. In this way, the PRBS test may compare the actual quality of the link to the link classification to determine whether the link is correctly classified.
In some implementations, network device 230 may update the class of the link based on a result of the diagnostic test. For example, network device 230 may update the class of the link if the diagnostic test indicates that an actual quality of the link is different from a quality associated with the classification.
In some implementations, network device 230 may update the class of the link by updating the data model. For example, network device 230 may modify (e.g., increase, decrease, etc.) one or more of the quality coefficient values based on a result of the diagnostic rest. In this case, network device 230 may re-provide the link quality information as input to the data model, which may cause the data model to update the class of the link using the one or more modified quality coefficient values. In this way, network device 230 increases accuracy of subsequent predictions (e.g., relative to not updating the data model).
As an example, assume network device 230 classifies the link into a class associated with high link quality and that the link fails the diagnostic test. In this case, network device 230 may update the class of the link to a class associated with marginal link quality or to a class associated with low link quality based on an output of the diagnostic test. Additionally, network device 230 may modify quality coefficient values that are associated with classifying the link, thereby allowing network device 230 to make more accurate subsequent link quality predictions. In this way, network device 230 may update the data model to eliminate false-positives (e.g., predictions that link quality is high when actual link quality is low).
As another example, assume network device 230 classifies the link into a class associated with marginal link quality and that the link does not receive a diagnostic test score associated with marginal link quality. In this case, network device 230 may update the class of the link to a class associated with high link quality or to a class associated with low link quality. If the link passes the diagnostic test (e.g., receives a score associated with high link quality), then network device 230 may update the data model by modifying quality coefficient values that are associated with classifying the link. If the link fails the diagnostic test (e.g., receives a score associated with low link quality), then network device 230 may update the data model by modifying quality coefficient values that are associated with classifying the link. In this way, network device 230 may update the data model to eliminate false-neutrals (e.g., predictions that link quality is marginal when actual link quality is high or low).
As another example, assume network device 230 classifies the link into a class associated with low link quality and that the link passes the diagnostic test. In this case, network device 230 may update the class of the link to a class associated with high link quality or to a class associated with marginal link quality. If the link passes the diagnostic test with a score associated with high link quality, then network device 230 may update the data model by modifying quality coefficient values that are associated with classifying the link. If the link passes the diagnostic test with a score associated with a marginal quality link, then network device 230 may update the data model by modifying (e.g., by a lower amount than an amount associated with updating the class of the link to a class associated with high link quality) quality coefficient values that are associated with classifying the link. In this way, network device 230 may update the data model to eliminate false-negatives (e.g., predictions that link quality is low when actual link quality is marginal or high).
As another example, assume network device 230 classifies the link and that the diagnostic test confirms that the classification is correct. In this case, network device 230 may update the data model with information indicating that the quality coefficient values used are the correct values. In this way, network device 230 is able to classify subsequent links that have similar link quality information with a higher degree of confidence.
In some implementations, network device 230 may monitor reclassified links (i.e., links that have had the classification updated). For example, network device 230 may monitor reclassified links by receiving an increased frequency of link quality information (e.g., an increased frequency as compared to a frequency in which link quality information was received during block 440). Additionally, network device 230 may make an increased number of link quality predictions to verify whether the reclassification improved accuracy of link classification. In some implementations, network device 230 may monitor reclassified links over a time interval. In some cases, a duration of the interval may be associated with link quality (e.g., a longer interval for low quality links than for marginal quality links).
As an example, assume network device 230 determines that the link is a high risk link by classifying the link as having low link quality. In this case, network device 230 may monitor and test the link more frequently and more extensively than monitoring and testing associated with lower risk links.
In some implementations, network device 230 may iteratively determine whether the link is correctly classified and may iteratively update the class of the link until the link is correctly classified.
By performing one or more actions associated with improving accuracy of link classification, a data model that has been continuously updated by network device 230 may correctly classify link quality in all (or most) situations, thereby allowing network device 230 to perform one or more associated with improving link quality.
As further shown in FIG. 4, process 400 may include performing, after classifying the link, one or more actions associated with improving link quality (block 470). For example, network device 230 may adapt to environmental conditions associated with the link (e.g., a temperature level, a voltage level, etc.), provide a report and/or a recommendation to an interested party (e.g., a network administrator, a technician, etc.), and/or the like.
In some implementations, network device 230 may adapt to environmental conditions associated with the link. For example, network device 230 may identify false predictions (e.g., false positives, false neutrals, false negatives, etc.), and may adapt to environmental conditions by retraining the data model after identifying the false predictions. In this way, network device 230 may retrain the data model to adapt to environmental conditions of a particular network device.
In some implementations, network device 230 may implement a forecasting technique with a separate data model. For example, network device 230 may train a separate data model that includes a set of characteristics associated with link faults (e.g., including data leading up to link failure). In this case, network device 230 may provide link quality information associated with an active link as input for the separate data model, which may cause the separate data model to output a projected time period at which the link is predicted to degrade in link quality past a particular link quality threshold.
In some implementations, network device 230 may blacklist the link (i.e., prevent the link from being able to actively support traffic flow). For example, network device 230 may blacklist a low quality link to prevent loss of traffic until the link and/or the board is repaired and/or replaced. In some cases, network device 230 may determine whether the link has a history of being classified into a class associated with low link quality, and may blacklist the link if the link has been classified into the class associated with low link quality more than a threshold number of times. In some cases, network device 230 may blacklist the link by providing an indication to a software module that controls link traffic that the link is no longer able to support traffic flow.
As an example, assume the link is classified into a class associated with low link quality. In this case, if the link has a history of being classified into the class associated with low link quality, then network device 230 may blacklist the link, thereby preventing the link from actively supporting traffic flow. If the link does not have a history of being classified into the class associated with low link quality, then network device 230 may perform a diagnostic test to determine an actual quality of the link, as described above. In this way, network device 230 is able to reduce link faults (e.g., packet loss, signal attenuation, etc.) within network device 230 until the link and/or the board is repaired and/or replaced.
Additionally, or alternatively, network device 230 may provide, to network management device 240, a recommendation to repair or replace one or more hardware components and/or software modules associated with the link. For example, network device 230 may generate a recommendation that includes link quality information for the link, results of a diagnostic test, and/or instructions indicating a particular action to be performed (e.g., repair the link, repair the board, replace the link, replace the board, etc.). In this case, network device 230 may provide the recommendation to network management device 240. In some implementations, network device 230 may automatically schedule an appointment to have an interested party (e.g., a technician) repair a hardware component and/or a software module associated with the link.
In this way, network device 230 may perform one or more actions associated with improving link quality. Additionally, network device 230 predicting link faults prior to the occurrence of the faults prevents traffic loss. Furthermore, by performing one or more actions associated with improving accuracy of link classification and/or link quality, network device 230 improves efficiency and reliability of network communications.
Although FIG. 4 shows example blocks of process 400, in some implementations, process 400 may include additional blocks, fewer blocks, different blocks, or differently arranged blocks than those depicted in FIG. 4. Additionally, or alternatively, two or more of the blocks of process 400 may be performed in parallel.
FIGS. 5A-5C are diagrams of one or more example implementations relating to the example process shown in FIG. 4. FIG. 5A shows an example implementation where network device 230 may identify a false positive, and my update the data model to improve accuracy of link classification. FIG. 5B shows an example implementation where network device 230 may identify a false neutral, and may update the data model to improve accuracy of link classification. FIG. 5C shows an example implementation where network device 230 may identify a false negative, and may update the data model to improve accuracy of link classification. In this way, network device 230 may continue to improve link classification until the data model is able to correctly classify all (or close to all) of the links, thereby ensuring that actions may be taken to improve link quality.
As shown in FIG. 5A, network device 230 may identify false positives (e.g., the data model may falsely classify a link into a class associated with high link quality when actual link quality is marginal or low). As shown by reference number 502, network device 230 may classify the link into a class associated with high link quality. In this case, network device 230 may determine whether actual link errors (e.g., hardware errors) are associated with the link by executing a FEC technique, a CRC technique, or the like.
As shown by reference number 504, if network device 230 determines that there are no FEC errors, CRC errors, or the like, then network device 230 may idle and/or monitor the link. In this case, the FEC technique, the CRC technique, or the like, serve to verify that link is correctly classified as having high link quality.
As shown by reference number 506, if network device 230 determines that there are FEC errors, CRC errors, or the like, then network device 230 may update the data model. In this case, network device 230 may update the data model by modifying one or more quality coefficient values. In this case, modifying the one or more quality coefficient values associated with the link quality information may allow the data model to correctly classify links with similar link quality information when the data model is used for subsequent classifications.
In this way, network device 230 is able to identify false positives, and may update the data model to improve the accuracy of link classification.
As shown in FIG. 5B, network device 230 may identify false neutrals (e.g., the data model may falsely classify a link into a class associated with marginal link quality when actual link quality is high or low). As shown by reference number 508, network device 230 may classify a link into a class associated with marginal link quality. In this case, and as shown by reference number 510, network device 230 may disable, monitor, and test the link, in the same manner described above. As shown by reference number 512, if the link passes the test with a high passing score (shown as a “high pass”), then network device 230 may update the class of the link to a class associated with high link quality. In this case, network device 230 may re-enable the link to allow the link to continue supporting network traffic flow.
Additionally, and as shown by reference number 514, network device 230 may update the data model. For example, network device 230 may update the data model by modifying one or more quality coefficient values. In this case, modifying the one or more quality coefficient values associated with the link quality information may allow the data model to correctly classify links with similar link quality information when the data model is used for subsequent classifications.
As shown by reference number 516, if the link fails the diagnostic test, then network device 230 may update the class of the link to a class associated with low link quality. In this case, and as shown by reference number 518, network device 230 may update the data model. For example, network device 230 may update the data model by modifying one or more quality coefficient values. This may allow the data model to correctly classify links with similar link quality information when the data model is used for subsequent classifications.
As shown by reference number 520, if the link passes the test with a low passing score (shown as a “low pass”), then network device 230 may update the class of the link to a class associated with marginal link quality. In this case, network device 230 may update the data model by updating one or more values associated with confidence scores, as described above.
In this way, network device 230 is able to identify false neutrals, and may update the data model to improve the accuracy of link classification.
As shown in FIG. 5C, network device 230 may identify false negatives (e.g., the data model may falsely classify a link into a class associated with low link quality when actual link quality is marginal or high). As shown by reference number 522, network device 230 may classify a link into a class associated with low link quality.
As shown by reference number 524, network device 230 may determine whether the link has received a threshold number of classifications associated with low link quality. In this case, network device 230 may compare a current number of times the link is classified into a class associated with low link quality and a threshold number of classifications associated with low link quality. As shown by reference number 526, network device 230 may blacklist the link, as described elsewhere herein. In this way, network device 230 is able to prevent links that are repeatedly classified as low quality from supporting traffic flow.
As shown by reference number 528, if the current number of times the link is classified into a class associated with low link quality does not satisfy the threshold number of classifications, then network device 230 may disable, monitor, and test the link (e.g., using a diagnostic test), as described elsewhere herein. As shown by reference number 530, if the link passes the diagnostic test, then network device 230 may update the class of the link to a class associated with high link quality. In this case, and as shown by reference number 532, network device 230 may update the data model by modifying one or more quality coefficient values. This may allow the data model to correctly classify a subsequent link with similar link quality information.
As shown by reference number 534, if the link fails the diagnostic test, then network device 230 may perform one or more actions associated with repairing the link. For example, network device 230 may provide a request to a network management device to repair the link, repair a board associated with the link, replace the board, and/or the like.
In this way, network device 230 is able to identify false negatives, and may update the data model to improve the accuracy of link classification.
As indicated above, FIGS. 5A-5C are provided merely as examples. Other examples are possible and may differ from what was described with regard to FIGS. 5A-5C.
In this way, network device 230 is able to identify false positives, false neutrals, and/or false negatives, and may update the data model to improve the accuracy of link classification. By correctly classifying links, network device 230 is able to identify situations where a link is beginning to degrade in quality, and is able to take corrective actions needed to repair the link, repair the board, replace the link, or replace the board, as described elsewhere herein. In this way, network device 230 improves network performance relative to taking corrective action after a link fails.
The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations.
As used herein, the term component is intended to be broadly construed as hardware, firmware, and/or a combination of hardware and software.
Some implementations are described herein in connection with thresholds. As used herein, satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.
It will be apparent that systems and/or methods, described herein, may be implemented in different forms of hardware, firmware, or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods were described herein without reference to specific software code—it being understood that software and hardware may be designed to implement the systems and/or methods based on the description herein.
Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible implementations. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of possible implementations includes each dependent claim in combination with every other claim in the claim set.
No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, etc.), and may be used interchangeably with “one or more.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.

Claims (20)

What is claimed is:
1. A device, comprising:
a memory; and
one or more processors to:
receive a trained data model;
determine, after receiving the data model, link quality information associated with a link that supports traffic;
classify the link by using the link quality information as input for the data model,
the data model to classify the link into a class, of a set of classes,
associated with measuring link quality;
disable the link after classifying the link into the class;
perform, after disabling the link, a diagnostic test to identify that the link is not correctly classified;
update the class of the link to another class using the data model and after performing the diagnostic test; and
update the data model by modifying one or more quality values that are associated with at least one of classifying the link or updating the class of the link.
2. The device of claim 1, where the one or more processors, when updating the class of the link to the other class, are to:
update the class of the link to a class associated with a low link quality when the link fails the diagnostic test; and
where the one or more processors, when updating the data model, are to:
update the data model based on the class associated with the low link quality.
3. The device of claim 1, where the one or more processors, when updating the class of the link to the other class, are to:
update the class of the link to a class associated with a high link quality when the link passes the diagnostic test,
the link passes the diagnostic test with a score above a threshold; and
where the one or more processors, when updating the data model, are to:
update the data model based on the class associated with the high link quality.
4. The device of claim 1, where the one or more processors, when determining the link quality information associated with the link, are to:
measure the link quality by identifying errors in data transmission using one or more of:
a forward error correction value, or
a cyclical redundancy check value.
5. The device of claim 1, where the one or more processors are further to:
classify the link as a low quality link a particular quantity of times; and
blacklist the link based on the particular quantity of times exceeding a threshold quantity of times.
6. The device of claim 1, where the one or more processors, when updating the data model, are to:
update the data model by modifying one or more quality coefficient values that are associated with the link quality information.
7. The device of claim 1, where the one or more processors, when determining the link quality information associated with the link, are to:
monitor data traveling through the link to determine a bit error ratio value; and
where the one or more processors, when classifying the link, are to:
classify the link based on the bit error ratio value.
8. A non-transitory computer readable medium storing instructions, the instructions comprising:
one or more instructions, that when executed by one or more processors, cause the one or more processors to:
receive a trained data model;
determine, after receiving the data model, link quality information associated with a link that supports traffic;
classify the link by using the link quality information as input for the data model,
the data model to classify the link into a class, of a set of classes,
associated with measuring link quality;
disable the link after classifying the link into the class;
perform, after disabling the link, a diagnostic test to identify that the link is not correctly classified;
update the class of the link to another class using the data model and after performing the diagnostic test; and
update the data model by modifying one or more quality values that are associated with at least one of classifying the link or updating the class of the link.
9. The non-transitory computer readable medium of claim 8, where the one or more instructions, that cause the one or more processors to update the class of the link to the other class, cause the one or more processors to:
update the class of the link to a class associated with a low link quality when the link fails the diagnostic test; and
where the one or more instructions, that cause the one or more processors to update the data model, cause the one or more processors to:
update the data model based on the class associated with the low link quality.
10. The non-transitory computer readable medium of claim 8, where the one or more instructions, that cause the one or more processors to update the class of the link to the other class, cause the one or more processors to:
update the class of the link to a class associated with a high link quality when the link passes the diagnostic test,
the link passes the diagnostic test with a score above a threshold; and
where the one or more instructions, that cause the one or more processors to update the data model, cause the one or more processors to:
update the data model based on the class associated with the high link quality.
11. The non-transitory computer readable medium of claim 8, where the one or more instructions, that cause the one or more processors to determine the link quality information associated with the link, cause the one or more processors to:
measure the link quality by identifying errors in data transmission using one or more of:
a forward error correction value, or
a cyclical redundancy check value.
12. The non-transitory computer readable medium of claim 8, where the one or more instructions further cause the one or more processors to:
classify the link as a low quality link a particular quantity of times; and
blacklist the link based on the particular quantity of times exceeding a threshold quantity of times.
13. The non-transitory computer readable medium of claim 8, where the one or more instructions, that cause the one or more processors to update the data model, cause the one or more processors to:
update the data model by modifying one or more quality coefficient values that are associated with the link quality information.
14. The non-transitory computer readable medium of claim 8, where the one or more instructions, that cause the one or more processors to determine the link quality information associated with the link, cause the one or more processors to:
monitor data traveling through the link to determine a bit error ratio value; and
where the one or more instructions, that cause the one or more processors to classify the link, cause the one or more processors to:
classify the link based on the bit error ratio value.
15. A method, comprising:
receiving, by a device, a trained data model;
determining, by the device and after receiving the data model, link quality information associated with a link that supports traffic;
classifying, by the device, the link by using the link quality information as input for the data model,
the data model to classify the link into a class, of a set of classes, associated with measuring link quality;
disabling, by the device, the link after classifying the link into the class;
performing, by the device and after disabling the link, a diagnostic test to identify that the link is not correctly classified;
updating, by the device, the class of the link to another class using the data model and after performing the diagnostic test; and
updating, by the device, the data model by modifying one or more quality values that are associated with at least one of classifying the link or updating the class of the link.
16. The method of claim 15, where updating the class of the link to the other class comprises:
updating the class of the link to a class associated with a low link quality when the link fails the diagnostic test; and
where updating the data model comprises:
updating the data model based on the class associated with the low link quality.
17. The method of claim 15, where updating the class of the link to the other class comprises:
updating the class of the link to a class associated with a high link quality when the link passes the diagnostic test,
the link passes the diagnostic test with a score above a threshold; and
where updating the data model comprises:
updating the data model based on the class associated with the high link quality.
18. The method of claim 15, further comprising:
classifying the link as a low quality link a particular quantity of times; and
blacklisting the link based on the particular quantity of times exceeding a threshold quantity of times.
19. The method of claim 15, where updating the data model comprises:
updating the data model by modifying one or more quality coefficient values that are associated with the link quality information.
20. The method of claim 15, where determining the link quality information associated with the link comprises:
monitoring data traveling through the link to determine a bit error ratio value; and
where classifying the link comprises:
classifying the link based on the bit error ratio value.
US16/406,251 2017-08-01 2019-05-08 Using machine learning to monitor link quality and predict link faults Active US10805174B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/406,251 US10805174B2 (en) 2017-08-01 2019-05-08 Using machine learning to monitor link quality and predict link faults

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/666,015 US10298465B2 (en) 2017-08-01 2017-08-01 Using machine learning to monitor link quality and predict link faults
US16/406,251 US10805174B2 (en) 2017-08-01 2019-05-08 Using machine learning to monitor link quality and predict link faults

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/666,015 Continuation US10298465B2 (en) 2017-08-01 2017-08-01 Using machine learning to monitor link quality and predict link faults

Publications (2)

Publication Number Publication Date
US20190268240A1 US20190268240A1 (en) 2019-08-29
US10805174B2 true US10805174B2 (en) 2020-10-13

Family

ID=63254496

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/666,015 Active 2037-08-04 US10298465B2 (en) 2017-08-01 2017-08-01 Using machine learning to monitor link quality and predict link faults
US16/406,251 Active US10805174B2 (en) 2017-08-01 2019-05-08 Using machine learning to monitor link quality and predict link faults

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/666,015 Active 2037-08-04 US10298465B2 (en) 2017-08-01 2017-08-01 Using machine learning to monitor link quality and predict link faults

Country Status (3)

Country Link
US (2) US10298465B2 (en)
EP (1) EP3439241B1 (en)
CN (2) CN109327347B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10298465B2 (en) 2017-08-01 2019-05-21 Juniper Networks, Inc. Using machine learning to monitor link quality and predict link faults
WO2019226715A1 (en) * 2018-05-21 2019-11-28 Promptlink Communications, Inc. Techniques for assessing a customer premises equipment device
US11665576B2 (en) * 2018-08-10 2023-05-30 Verizon Patent And Licensing Inc. Systems and methods for wireless low latency traffic scheduler
US10659337B2 (en) * 2018-08-28 2020-05-19 Inphi Corporation Retimer data communication modules
US11095596B2 (en) * 2018-10-26 2021-08-17 International Business Machines Corporation Cognitive request management
US11140063B2 (en) * 2019-02-25 2021-10-05 Adtran, Inc. Dynamic subscriber network physical impairment detection techniques
US20220191107A1 (en) * 2019-02-26 2022-06-16 Telefonaktiebolaget Lm Ericsson (Publ) Method and devices for transfer learning for inductive tasks in radio access network
US11159447B2 (en) * 2019-03-25 2021-10-26 Cisco Technology, Inc. Predictive routing using machine learning in SD-WANs
US11574241B2 (en) * 2019-04-24 2023-02-07 Cisco Technology, Inc. Adaptive threshold selection for SD-WAN tunnel failure prediction
US10523549B1 (en) * 2019-06-02 2019-12-31 Cybertoka Ltd Method and system for detecting and classifying networked devices
CN112153669B (en) * 2019-06-28 2023-03-10 华为技术有限公司 Data transmission method, device and equipment
CN110674010B (en) * 2019-09-10 2021-04-06 西安电子科技大学 Intelligent device application program identification method based on session length probability distribution
CN111178378B (en) * 2019-11-07 2023-05-16 腾讯科技(深圳)有限公司 Equipment fault prediction method and device, electronic equipment and storage medium
US11240122B2 (en) * 2019-11-25 2022-02-01 Cisco Technology, Inc. Event-triggered machine learning for rare event forecasting in a software defined wide area Network (SD-WAN)
CN112187514A (en) * 2020-09-02 2021-01-05 上海御威通信科技有限公司 Intelligent operation and maintenance system, method and terminal for data center network equipment
CN112595996B (en) * 2020-11-26 2022-08-26 云南电网有限责任公司电力科学研究院 Gear determining method of transformer
CN112346393B (en) * 2021-01-08 2021-04-13 睿至科技集团有限公司 Intelligent operation and maintenance based data full link abnormity monitoring and processing method and system
US20220247652A1 (en) * 2021-01-29 2022-08-04 Marvell Asia Pte Ltd Link-quality estimation and anomaly detection in high-speed wireline receivers
CN115134246B (en) * 2021-03-22 2023-07-21 中国移动通信集团河南有限公司 Network performance index monitoring method, device, equipment and storage medium
CN114172600A (en) * 2021-12-03 2022-03-11 中国电信集团系统集成有限责任公司 Automatic transmission link path calculation method and device
US11811644B2 (en) * 2021-12-13 2023-11-07 Cisco Technology, Inc. Distributed predictive routing using lightweight state tracking
DE102021134259A1 (en) * 2021-12-22 2023-06-22 Endress+Hauser Flowtec Ag Method for determining the quality of a wireless transmission of data packets from a field device
CN117081956B (en) * 2023-10-13 2024-01-02 北京国基科技股份有限公司 Link quality evaluation method, system, storage medium and electronic equipment

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030091017A1 (en) 1999-10-04 2003-05-15 Davenport David M. Method for data exchange with a mobile asset considering communication link quality
US20060020509A1 (en) 2004-07-26 2006-01-26 Sourcecorp Incorporated System and method for evaluating and managing the productivity of employees
US20090059793A1 (en) 2007-08-14 2009-03-05 Greenberg Albert G Traffic engineering method, system and computer program product for managing traffic over dynamic networks during both normal and unexpected traffic scenarios
US20100027418A1 (en) 2008-07-30 2010-02-04 Benjamin Rodrig System and method of controlling in-bound path selection based on historical and continuous path quality monitoring, assessment and predictions
US20100118761A1 (en) 2008-11-10 2010-05-13 Qualcomm Incorporated Methods and apparatus supporting adaptive decentralized traffic scheduling including a dynamic receiver yielding threshold
US20120120843A1 (en) 2010-11-15 2012-05-17 Nicholas William Anderson Managing Wireless Communications
EP2469757A1 (en) 2010-12-24 2012-06-27 British Telecommunications Public Limited Company Communications network management
US20130176871A1 (en) 2009-12-21 2013-07-11 Telefonaktiebolaget L M Ericsson (Publ) Network Bottleneck Management
US20150333969A1 (en) 2014-05-13 2015-11-19 Cisco Technology, Inc. Predictive networking architecture for next-generation multiservice, multicarrier wans
US20160057041A1 (en) 2014-08-22 2016-02-25 Vmware, Inc. Automatic remediation of poor-performing virtual machines for scalable applications
US20160241435A1 (en) 2013-09-27 2016-08-18 Freescale Semiconductor, Inc. Apparatus for optimising a configuration of a communications network device
US20160262166A1 (en) 2015-03-06 2016-09-08 At&T Mobility Ii Llc Apparatus and method to identify user equipment performance and to optimize network performance via big data
US20160266954A1 (en) 2015-03-10 2016-09-15 Rasa Networks, Inc. Mitigating Wireless Networking Problems of a Wireless Network
US20160364434A1 (en) 2015-06-12 2016-12-15 Ab Initio Technology Llc Data quality analysis
EP3133492A1 (en) 2015-08-20 2017-02-22 Accenture Global Services Limited Network service incident prediction
US20170093750A1 (en) 2015-09-28 2017-03-30 Centurylink Intellectual Property Llc Intent-Based Services Orchestration
US20170264330A1 (en) 2014-07-31 2017-09-14 Beijing Zhigu Rui Tuo Tech Co., Ltd Wireless communications methods and devices
US10250497B1 (en) 2017-06-07 2019-04-02 Juniper Networks, Inc. Avoiding false duplicate network address detection in virtual router redundancy protocol (VRRP) scenarios
US10298465B2 (en) 2017-08-01 2019-05-21 Juniper Networks, Inc. Using machine learning to monitor link quality and predict link faults

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6958986B2 (en) * 2002-01-10 2005-10-25 Harris Corporation Wireless communication system with enhanced time slot allocation and interference avoidance/mitigation features and related methods
US8315574B2 (en) * 2007-04-13 2012-11-20 Broadcom Corporation Management of variable-rate communication links
US9655000B2 (en) * 2014-08-19 2017-05-16 Verizon Patent And Licensing Inc. Optimized quality of service transport architecture to compensate for SINR variations
US9756112B2 (en) * 2015-02-11 2017-09-05 At&T Intellectual Property I, L.P. Method and system for managing service quality according to network status predictions
CN106068017B (en) * 2016-04-13 2019-06-25 合肥工业大学 Radio link quality prediction technique based on wavelet neural network
CN106250515B (en) * 2016-08-04 2020-05-12 复旦大学 Missing path recovery method based on historical data
CN106899448B (en) * 2017-01-22 2019-06-28 中国人民解放军信息工程大学 Suitable for network state and the integrated dynamic weight index appraisal procedure of performance measurement
CN106878109A (en) * 2017-03-13 2017-06-20 网宿科技股份有限公司 Server detection method and server system

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030091017A1 (en) 1999-10-04 2003-05-15 Davenport David M. Method for data exchange with a mobile asset considering communication link quality
US20060020509A1 (en) 2004-07-26 2006-01-26 Sourcecorp Incorporated System and method for evaluating and managing the productivity of employees
US20090059793A1 (en) 2007-08-14 2009-03-05 Greenberg Albert G Traffic engineering method, system and computer program product for managing traffic over dynamic networks during both normal and unexpected traffic scenarios
US20100027418A1 (en) 2008-07-30 2010-02-04 Benjamin Rodrig System and method of controlling in-bound path selection based on historical and continuous path quality monitoring, assessment and predictions
US20100118761A1 (en) 2008-11-10 2010-05-13 Qualcomm Incorporated Methods and apparatus supporting adaptive decentralized traffic scheduling including a dynamic receiver yielding threshold
US20130176871A1 (en) 2009-12-21 2013-07-11 Telefonaktiebolaget L M Ericsson (Publ) Network Bottleneck Management
US20120120843A1 (en) 2010-11-15 2012-05-17 Nicholas William Anderson Managing Wireless Communications
EP2469757A1 (en) 2010-12-24 2012-06-27 British Telecommunications Public Limited Company Communications network management
US20160241435A1 (en) 2013-09-27 2016-08-18 Freescale Semiconductor, Inc. Apparatus for optimising a configuration of a communications network device
US20150333969A1 (en) 2014-05-13 2015-11-19 Cisco Technology, Inc. Predictive networking architecture for next-generation multiservice, multicarrier wans
WO2015175260A1 (en) 2014-05-13 2015-11-19 Cisco Technology, Inc. Predictive networking architecture for next-generation multiservice, multicarrier wans
US20170264330A1 (en) 2014-07-31 2017-09-14 Beijing Zhigu Rui Tuo Tech Co., Ltd Wireless communications methods and devices
US20160057041A1 (en) 2014-08-22 2016-02-25 Vmware, Inc. Automatic remediation of poor-performing virtual machines for scalable applications
US20160262166A1 (en) 2015-03-06 2016-09-08 At&T Mobility Ii Llc Apparatus and method to identify user equipment performance and to optimize network performance via big data
US20160266954A1 (en) 2015-03-10 2016-09-15 Rasa Networks, Inc. Mitigating Wireless Networking Problems of a Wireless Network
US20160364434A1 (en) 2015-06-12 2016-12-15 Ab Initio Technology Llc Data quality analysis
EP3133492A1 (en) 2015-08-20 2017-02-22 Accenture Global Services Limited Network service incident prediction
US9806955B2 (en) 2015-08-20 2017-10-31 Accenture Global Services Limited Network service incident prediction
US20170093750A1 (en) 2015-09-28 2017-03-30 Centurylink Intellectual Property Llc Intent-Based Services Orchestration
US10250497B1 (en) 2017-06-07 2019-04-02 Juniper Networks, Inc. Avoiding false duplicate network address detection in virtual router redundancy protocol (VRRP) scenarios
US10298465B2 (en) 2017-08-01 2019-05-21 Juniper Networks, Inc. Using machine learning to monitor link quality and predict link faults

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
Awuah I., "Juniper Networks Router Architecture," https://www.Slideshare.net/lawuah/juniper-networks-router-architecture, Feb. 20, 2011, 19 pages.
Di Caro Gianni A., et al., "Online Supervised Incremental Learning of Link Quality Estimates in Wireless Networks", 2013 12th annual Mediterranean AD HOC networking workshop (MED-HOCNET), IEEE Jun. 24, 2013, p. 133-140.
Extended European Search Report corresponding to EP18186391, dated Oct. 30, 2018,10 pages.
Extended European Search Report for Application No. EP18175135, dated Oct. 12, 2018, 7 pages.
Juniper Networks Techlibrary, "Supported TX Matrix Plus Routing Matrix Configurations," https://www.juniper.net/documentatoin/en_US/release-independent/junos/topics/concept/tx-matrix-pplus-supported-configurations.html, Dec. 19, 2014, 8 pages.
Juniper Networks Techlibrary, "Switching Plane Connections Between TXP-F13 SIBs and TXP-T1600," https://www.juniper.net/documentatoin/en_US/release-independent/junos/topics/concept/switching-plane-tx-matrix-plus-connections.html, Dec. 19, 2017, 9 pages.
Juniper Networks, "Understand MX Fabric," https://kb.juniper.net/infocenter/index?page=content&id=KB23065&actp=METADATA, 2017, 3 pages.
O'Reilly Safari, "Chapter 1. Juniper QFX1 0000 Hardware Architecture," https://www.safaribooksonline.com/library/view/juniper-gfx10000-series/978149122248/ch01.html, 2017, 62 pages.
Wang Y., et al., "Predicting Link Quality Using Supervised Learning in Wireless Sensor Networks", Jul. 1, 2007, vol. 11(3), pp. 71-83.

Also Published As

Publication number Publication date
EP3439241B1 (en) 2020-12-16
CN115225519B (en) 2023-12-01
US10298465B2 (en) 2019-05-21
US20190044824A1 (en) 2019-02-07
EP3439241A1 (en) 2019-02-06
US20190268240A1 (en) 2019-08-29
CN109327347A (en) 2019-02-12
CN109327347B (en) 2022-07-19
CN115225519A (en) 2022-10-21

Similar Documents

Publication Publication Date Title
US10805174B2 (en) Using machine learning to monitor link quality and predict link faults
US11240153B1 (en) Scoring policies for predictive routing suggestions
US11570038B2 (en) Network system fault resolution via a machine learning model
EP3541016B1 (en) Telecommunications network troubleshooting systems
US11057297B2 (en) Method, device and computer program product for path optimization
BR112021003586A2 (en) method performed by one or more computers, system, and one or more computer readable media
CN107124365B (en) Routing strategy acquisition system based on machine learning
US20190059008A1 (en) Data intelligence in fault detection in a wireless communication network
US10404570B2 (en) Automatically detecting an error in a communication and automatically determining a source of the error
EP3232620B1 (en) Data center based fault analysis method and device
US10862805B1 (en) Intelligent offloading of services for a network device
CN112866010B (en) Fault positioning method and device
CN113660128A (en) Network device failure prediction method, electronic device, and storage medium
Mohammed et al. Machine learning-based network status detection and fault localization
CN111669282B (en) Method, device and computer storage medium for identifying suspected root cause alarm
CN113518367B (en) Fault diagnosis method and system based on service characteristics under 5G network slice
US10951548B2 (en) Method for resetting a packet processing component to an operational state
CN113839861A (en) Routing engine switching based on health determined by support vector machine
CN114244713B (en) Resource backup method and device for electric power 5G network slice
US11770328B2 (en) Network including data integrity monitoring
JP2022037107A (en) Failure analysis device, failure analysis method, and failure analysis program
WO2021165976A1 (en) Service level agreement maintenance in telecom networks
WO2023005817A1 (en) Path determination method and apparatus, device, system, and computer readable storage medium
CN110738234B (en) Role prediction method and device
US10728147B2 (en) Collection of forwarding rules

Legal Events

Date Code Title Description
AS Assignment

Owner name: JUNIPER NETWORKS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YADAV, ALAM;N, MADHAVA;SANYAL, SAIKAT;SIGNING DATES FROM 20170731 TO 20170801;REEL/FRAME:049114/0124

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: AWAITING TC RESP, ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4