GB2561181A - Network Fault Discovery - Google Patents

Network Fault Discovery Download PDF

Info

Publication number
GB2561181A
GB2561181A GB1705353.9A GB201705353A GB2561181A GB 2561181 A GB2561181 A GB 2561181A GB 201705353 A GB201705353 A GB 201705353A GB 2561181 A GB2561181 A GB 2561181A
Authority
GB
United Kingdom
Prior art keywords
access
response
network
access point
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1705353.9A
Other versions
GB201705353D0 (en
GB2561181B (en
Inventor
Abouelmaati Dalia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Telecommunications PLC
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Priority to GB1705353.9A priority Critical patent/GB2561181B/en
Publication of GB201705353D0 publication Critical patent/GB201705353D0/en
Publication of GB2561181A publication Critical patent/GB2561181A/en
Application granted granted Critical
Publication of GB2561181B publication Critical patent/GB2561181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5046Resolving address allocation conflicts; Testing of addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Network addresses which are frequently accessed by user terminals are monitored by a network access point to which the terminals are connected, and the network access point 2 then makes attempts (50, fig 5) to access the network addresses periodically and makes reports to a network management system (53, fig 5) of any such network addresses that fail to respond. This increases the likelihood of detection and reporting of failures of active addresses before the user has need of them. The access point may comprise a counter 25 for counting access attempts to a network address, and for controlling a test generation processor 27 to generate access requests in response to the counter identifying a predetermined number of requests in a predetermined time. The test generation processor may be configured to generate test access requests at staggered times of the day and may be controlled by a response monitor 28 to generate test access requests to a network address at a first interval if the response monitor records a successful response to the previous request, and a second shorter interval if the response monitor records a failed response to the previous request.

Description

(54) Title of the Invention: Network Fault Discovery
Abstract Title: Network fault discovery using access points to generate test requests to network addresses.
(57) Network addresses which are frequently accessed by user terminals are monitored by a network access point to which the terminals are connected, and the network access point 2 then makes attempts (50, fig 5) to access the network addresses periodically and makes reports to a network management system (53, fig 5) of any such network addresses that fail to respond. This increases the likelihood of detection and reporting of failures of active addresses before the user has need of them. The access point may comprise a counter 25 for counting access attempts to a network address, and for controlling a test generation processor 27 to generate access requests in response to the counter identifying a predetermined number of requests in a predetermined time. The test generation processor may be configured to generate test access requests at staggered times of the day and may be controlled by a response monitor 28 to generate test access requests to a network address at a first interval if the response monitor records a successful response to the previous request, and a second shorter interval if the response monitor records a failed response to the previous request.
Figure GB2561181A_D0001
>
At least one drawing originally filed was informal and the print reproduced here is taken from a later filed formal copy.
1/4
1302 18
Figure GB2561181A_D0002
2/4
1302 18
Figure GB2561181A_D0003
C\l φ
ι_
Ξ3 .5Ρ ΐ_ι_
3/4
1302 18
Figure GB2561181A_D0004
4/4
CM
Ο
Figure GB2561181A_D0005
Radio Basestation-Detection Process
Figure GB2561181A_D0006
I c
.© ©
© «3 ©
PP «3
CO ©
© ©
Ph fi ©
.© ©
Pi ©
©
Σ) (Λ
Network Fault Discovery
The invention relates to monitoring of a network to identify outages of resources associated with network addresses.
It is known to monitor individual network addresses to detect outages, as described for example in WO2016/118899 and US2011/208992. In these examples, addresses are monitored periodically by network gateways to identify any which are failing to respond, indicating a possible failure of the server at that address or the communications links connecting it to the rest of the network. However, this involves an additional io communications overhead in transmitting the test messages and responses. Moreover, it does not take account of how significant such a failure may be, as the failed address may relate to a server which has fallen into disuse and is rarely accessed by access requests from real users, as distinct from the test messages.
Individual user terminals could report access failures, but this would only identify problems retrospectively, and only when a user terminal is connected and a request for access is made. It is desirable to identify outages of resources before those resources are requested, so that they can be remedied before the resource is required.
According to the invention, there is provided a process for recording access attempts to network addresses made by user terminals through an access point, attempting access from the access point to network addresses for which successful access attempts have been recorded in a predetermined period, and reporting to a network management system any such network addresses that fail to respond.
Preferably, the network management system is responsive to multiple failure reports from different access points relating to the same network address by recording occurrence of a potential fault condition associated with the network address. It may also be responsive to multiple failures of access attempts from a given access point by recording occurrence of a potential fault condition associated with the access point.
In a preferred embodiment, access attempts by user terminals to each address are counted during a predetermined period, and the access point attempts access periodically to addresses which have been recorded as having been accessed through the access point more than a predetermined number of times. Access attempts may be made at different times of day, in order to identify failure modes which have a diurnal pattern, for example because of overloads at times of peak demand.
Following a first access attempt, the intervals between subsequent access attempts may be selected according to whether the first access attempt fails or succeeds.
To minimise communication overhead, alerts may be transmitted to the remote management system only if a response received by the response monitor in response to a test access request to a network address is different from a preceding response received by the response monitor in response to a previous test access request to the same network address.
The invention also provides a communications access point for connecting one or more user terminals to a data communications network, having an access request monitor configured to operate the process of the invention.
The invention makes use of individual user’s network access points to monitor network addresses in regular use. Each access point records network addresses regularly accessed from the access point, and periodically checks those addresses to see if they are still active. If any fail to respond, this is reported to a network management entity which co-ordinates the data to allow collection of data from multiple access points which can be used to identify problems (e.g denial of service attacks, system outages etc). This allows the network operator to identify potential problems before the customer is inconvenienced by them. Thus a distributed mechanism can be provided for monitoring access requests, which will only notify the network if needed, preventing the overload of the network.
Reporting access request history only when an individual access point detects a problem reduces network overhead and allows monitoring to be concentrated on websites that are attracting the most interest, as compared with others that are dormant. It can also identify if an access problem is specific to an individual website/access point pair.
The co-ordination of data at network level also allows problems specific to an individual access point to be identified - in particular if access requests from an individual access point to multiple addresses are resulting in errors, this may be indicative of a problem with the backhaul connection to that access point, or with a user terminal connected to that access point, rather than with the addresses to which the requests are directed.
Another benefit is that with the huge increase of network access devices, it is more feasible to check only the most frequently-used websites rather than checking everything all the time, thereby lessening the communication overhead.
Embodiments of the invention will now be described, by way of example, with reference to the drawings, in which:
- Figure 1 depicts the network entities which co-operate to perform the invention
- Figure 2 depicts a radio base station configured to operate according to the invention
- Figure 3 depicts a network management entity configured to operate according to the invention
- Figure 4 depicts a first stage in a process according to the invention
- Figure 5 depicts a second stage in a process according to the invention
Figure 1 depicts in schematic form a simplified network 6 connected to a network management system 3, an access point 2 and a target website server 5. A user terminal
1 can connect to the network 6 through the access point 2, and thereby communicate with the target website 5 and the management system 3. It will be recognised that in any practical system there will be many access points 2 and website servers 5 interconnected through the network 6, and each access point 2 may be connected to multiple user terminals 1.
The access point 2 may be a domestic wireless router, femtocell or enterprise femtocell connected wirelessly to the user terminal 1, or they may have a wired connection (e.g Ethernet). A wireless access point is depicted schematically in more detail in Figure 2. The functional elements depicted in Figure 2 are typically embodied in software or firmware. The access point 2 has a wireless interface 20 for communication with user terminals 1, and a network interface 22 for connection to a data communications network such as the Internet. Data packets are translated from one medium to the other by a modem 23 and routing processes such as reading and writing address packets are is controlled by a routing function 21.
In addition to these conventional functions, the access point operates a number of additional functions in accordance with an embodiment of the invention. A monitoring system 24 intercepts access requests generated by user terminals connected by the access point, and stores a record of such requests in a memory store 26. A counter 25 is used to determine the number of access requests made to each individual address, and this is used to update the store.
io A test generation system 27 is arranged to transmit access requests periodically to the addresses stored in the data store, by way of the modem 23 and network interface 22. A response monitor system 28 intercepts responses to these access requests, and controls an alert generation system 29 which is configured to process messages received over the network interface 22 in response to such requests by transmitting reports by way of the modem 23 and network 6 to the management entity 3.
Figure 3 depicts a network management entity 3, which may be embodied in software, which co-operates with a number of access points 2 of the kind depicted in Figure 2. The functional elements include a report reception function 30 which is configured to receive reports from the various access points about possible outages of network based server platforms such as the one depicted at 5 in Figure 1. Such reports are stored in a database 31 for retrieval by a retrieval unit 32 which analyses the reports to identify patterns in the failure reports which may indicate a fault with a server 5 or with a user terminal 1, and reports to an appropriate fault management system 33, 34 accordingly.
The process by which the base station 2 operates is depicted in Figure 4 and Figure 5, which illustrate two stages in the process. Figure 4 depicts a method for selecting which network addresses are to be monitored, and Figure 5 depicts the actual monitoring process. It should be noted that these processes can run concurrently, and in particular, the list of addresses to be monitored is continuously updated.
As shown in Figure 4, the request monitor unit 24 in radio base station 2 detects access requests made by the users and records a list of URLs (Internet Protocol addresses) that are regularly used by the customer. To do this it first stores the address identities in a temporary counting store 25 (step 40). At each such successful access attempt, a comparison is made with addresses already in the store 25 (step 41) and any address which occurs more than a predetermined number of times within a specified period t (for example five times in seven days) is forwarded to the main memory store 26. Each successful access attempt is removed from the temporary store (step 42) once the time window t has expired for that access attempt.
Addresses may be removed from the permanent store 26 if they have not been accessed for a longer predetermined period.
io As shown in Figure 5, the test generation system 27 identifies the addresses currently in the store 26 and tests each one from time to time to determine if they are still active, by sending an access request to each one (step 50) by way of the modem 23 and network interface 22. It is preferable that this is done when traffic is otherwise quiet, but it may be desirable to make successive tests on a particular website at different times of day as there may be a diurnal pattern of availability of certain websites which would not be detected if the test were made at the same time each day. The requests are flagged with an address corresponding to the response monitor unit 28 so that the responses are not forwarded to any of the user terminals.
The response monitor unit 28 is alerted to the requests and responds accordingly when a response is received, according to the process depicted in Figure 5. If any of these URLs 5 is not responding, or responds with an error message (step 51) a report is generated by the alert generation unit 29 and sent by way of the modem 23, network interface 22, and Internet 6, to the management entity 3. The input 30 of the management entity 3 receives reports from multiple access and stores them in a store 31 for analysis.
The reports are analysed in a retrieval unit 32. If several access points report a failure of the same target network address 5, this is flagged as a potential fault with the target address and reported to a server fault management system 33, for example as a possible denial of service (DNS) issue. However, if more than one, or all of the target addresses tested by an individual access point are not responding, the report anlayser 32 may identify this as a potential problem with the access point, for example with its security settings or backhaul connection, and report to the backhaul fault management system 34. (It will be appreciated that only faults short of complete failure of the backhaul connection would be able to be reported in this way)
The management entity 3 can process such reports to identify clusters or patterns to help identifying the cause of the issue. This allows the network operator to be more proactive, knowing about the issue and fixing it even before the customer notices. As the access point reports potential faults, fault detection can be determined even if no user terminal is currently connected to the access point, so that the problem can be reported to the network management system 3 before the user needs to use the address.
io The response monitor system 28 stores the status of the url, and when the next check is performed by the test generation system 27 the response is again analysed by the monitor unit. After a certain time t (step 54, 55) the test generation system performs another check. The process depicted in Figure 5 is arranged such that a change of status is reported to the management entity 3. If the address it is still returning a fault report (step
52) it does not inform the Management entity 3 again, but if it is has returned to activity the management entity is informed (step 53), so that the Management entity stops taking any further actions.
The interval t’ between tests may be shorter when a url is on record as faulty (step 55) than the time t when it is operating normally (step 54), so that updates are received more frequently.

Claims (11)

1. A process for recording access attempts to network addresses made by user terminals through an access point, attempting access from the access point to
5 network addresses for which successful access attempts have been recorded in a predetermined period, and reporting to a network management system any such network addresses that fail to respond.
io
2. A process according to Claim 1, in which the network management system is responsive to multiple failure reports from different access points relating to the same network address by recording occurrence of a potential fault condition associated with the network address.
3. A process according to Claim 1 or Claim 2, in which the network management system is responsive to multiple failures of access attempts from a given access point by recording occurrence of a potential fault condition associated with the access point.
4. A process according to Claim 1, Claim 2 or Claim 3, in which access attempts by user terminals to each address are counted during a predetermined period, and in which the access point attempts access periodically to addresses which have been
25 recorded as having been accessed through the access point more than a predetermined number of times.
5. A process according to Claim 1, Claim 2, Claim 3 or claim 4, in which access
30 attempts are made at different times of day.
6.
5 7.
A process according to Claim 1, Claim 2, Claim 3, Claim 4 or Claim 5, in following a first access attempt, a subsequent access attempt is made after an interval which is selected according to whether the first access attempt fails or succeeds.
A process according to Claim 1, Claim 2, Claim 3, Claim 4, Claim 5 or Claim 6, in which an alert is transmitted to the remote management system only if a response received by the response monitor in response to a test access request to a network address is different from a preceding response received by the response monitor in response to a previous test access request to the same network address.
8.
A communications access point for connecting one or more user terminals to a data communications network, having an access request monitor for detecting and recording access requests made by user terminals connected to the access point to target addresses, a test generation processor for generating test access requests for transmission over the data communications network to the target addresses, a response monitor for detecting responses to the test messages received from the data communications network by way of the data communications network, and an alerting processor for generating reports of failed responses, for transmission to a remote management system.
9.
A communications access point according to Claim 8, comprising a counter for counting access attempts to a network address, and for controlling the test generation processor to generate access requests in response to the counter identifying a predetermined number of requests in a predetermined time.
A communications access point according to Claim 8 or Claim 9, wherein the test generation processor is configured to generate test access requests at staggered times of day.
11. A communications access point according to Claim 8, Claim 9 or Claim 10, wherein the test generation processor is controlled by the response monitor to generate test access requests to a network address at a first interval if the response monitor records a successful response to the previous request, and at a second, shorter
5 interval if the response monitor records a failed response to the previous request.
12. A communications access point according to Claim 8, Claim 9, or Claim 10, wherein the alerting processor is configured to transmit an alert to the remote management system only if a response received by the response monitor in io response to a test access request to a network address is different from a preceding response received by the response monitor in response to a previous test access request to the same network address.
15 13. A process for remote configuration of a programmable device to operate according to the communications access point of claim 8, claim 9, Claim 10, claim 11, or Claim 12 or claim 13, by transmission of programme data to the wireless-enabled device over a data communications connection.
20 14. A computer system including a processor and memory storing computer program code for performing the steps of the process of Claim 1, Claim 2, Claim 3, Claim
4, Claim 5, Claim 6, or claim 7.
15. A computer program element comprising computer program code to, when loaded
25 into a computer system and executed thereon, cause the computer to perform the steps of a process as claimed in any of Claim 1, Claim 2, Claim 3, Claim 4, Claim
5, Claim 6 or claim 7.
Intellectual
Property
Office
Application No: GB1705353.9 Examiner: Mr Hitesh Kerai
GB1705353.9A 2017-04-03 2017-04-03 Network Fault Discovery Active GB2561181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1705353.9A GB2561181B (en) 2017-04-03 2017-04-03 Network Fault Discovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1705353.9A GB2561181B (en) 2017-04-03 2017-04-03 Network Fault Discovery

Publications (3)

Publication Number Publication Date
GB201705353D0 GB201705353D0 (en) 2017-05-17
GB2561181A true GB2561181A (en) 2018-10-10
GB2561181B GB2561181B (en) 2020-03-18

Family

ID=58682720

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1705353.9A Active GB2561181B (en) 2017-04-03 2017-04-03 Network Fault Discovery

Country Status (1)

Country Link
GB (1) GB2561181B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090013210A1 (en) * 2007-06-19 2009-01-08 Mcintosh P Stuckey Systems, devices, agents and methods for monitoring and automatic reboot and restoration of computers, local area networks, wireless access points, modems and other hardware
US20090161556A1 (en) * 2007-12-19 2009-06-25 Zhiqiang Qian Methods and Apparatus for Fault Identification in Border Gateway Protocol Networks
US20130215768A1 (en) * 2012-02-21 2013-08-22 Avaya Inc. System and method for automatic dscp tracing for xoip elements

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090013210A1 (en) * 2007-06-19 2009-01-08 Mcintosh P Stuckey Systems, devices, agents and methods for monitoring and automatic reboot and restoration of computers, local area networks, wireless access points, modems and other hardware
US20090161556A1 (en) * 2007-12-19 2009-06-25 Zhiqiang Qian Methods and Apparatus for Fault Identification in Border Gateway Protocol Networks
US20130215768A1 (en) * 2012-02-21 2013-08-22 Avaya Inc. System and method for automatic dscp tracing for xoip elements

Also Published As

Publication number Publication date
GB201705353D0 (en) 2017-05-17
GB2561181B (en) 2020-03-18

Similar Documents

Publication Publication Date Title
US7640460B2 (en) Detect user-perceived faults using packet traces in enterprise networks
US8443074B2 (en) Constructing an inference graph for a network
EP3248330B1 (en) Method and system for isp network performance monitoring and fault detection
CN100417081C (en) Method, system for checking and repairing a network configuration
US11632320B2 (en) Centralized analytical monitoring of IP connected devices
CN113472607B (en) Application program network environment detection method, device, equipment and storage medium
US20160283307A1 (en) Monitoring system, monitoring device, and test device
JP4598065B2 (en) Monitoring simulation apparatus, method and program thereof
EP3682595B1 (en) Obtaining local area network diagnostic test results
JP5617304B2 (en) Switching device, information processing device, and fault notification control program
WO2021040846A1 (en) Automated detection and classification of dynamic service outages
WO2006118858A2 (en) Wireless data device performance monitor
CN111104239A (en) Hard disk fault processing method, system and device for distributed storage cluster
US11153769B2 (en) Network fault discovery
WO2021233563A1 (en) Service producer health-check
GB2561181A (en) Network Fault Discovery
CN108880994B (en) Method and device for retransmitting mails
JP6488600B2 (en) Information processing system, program, and information processing apparatus
CN113824595A (en) Link switching control method and device and gateway equipment
CN110225543B (en) Mobile terminal software quality situation perception system and method based on network request data
KR102229613B1 (en) Method and apparatus for web firewall maintenance based on non-face-to-face authentication using maching learning self-check function
CN114124897B (en) CDN node control method and device, electronic equipment and readable storage medium
Bapat et al. Chowkidar: Reliable and scalable health monitoring for wireless sensor network testbeds
CN116841834A (en) State adjustment method and device, storage medium and electronic device
CN116866150A (en) Method and device for uploading alarm information of fault network element