WO2016033897A1 - 一种网络链路监控方法和设备以及网络系统和存储介质 - Google Patents

一种网络链路监控方法和设备以及网络系统和存储介质 Download PDF

Info

Publication number
WO2016033897A1
WO2016033897A1 PCT/CN2014/093557 CN2014093557W WO2016033897A1 WO 2016033897 A1 WO2016033897 A1 WO 2016033897A1 CN 2014093557 W CN2014093557 W CN 2014093557W WO 2016033897 A1 WO2016033897 A1 WO 2016033897A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
way delay
standby node
module
delay detection
Prior art date
Application number
PCT/CN2014/093557
Other languages
English (en)
French (fr)
Inventor
卓泽城
张鹏飞
王碧茜
刘斌
刘文波
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to US14/902,308 priority Critical patent/US10033592B2/en
Publication of WO2016033897A1 publication Critical patent/WO2016033897A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0858One way delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0806Configuration setting for initial configuration or provisioning, e.g. plug-and-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes

Definitions

  • the present invention relates to the field of communications, and in particular, to a network link monitoring method and apparatus, and a network system and a storage medium.
  • IDC Internet Data Center
  • the existing network link monitoring technology although collecting the port traffic of the device, can confirm that a port of a device is congested at this time, but cannot confirm the congestion of the entire service data flow, and cannot measure the end. The delay to the end.
  • the traceroute tool is used to detect the two ends of the server. The following problems exist: the link on the entire network cannot be monitored; the problem caused by the asymmetric link cannot be solved; and the problem cannot be quickly located.
  • the embodiment of the present invention provides a network link monitoring method and device.
  • an embodiment of the present invention provides a network link monitoring method, where the method includes:
  • the master node module sends a configuration file to multiple standby node modules
  • the standby node module performs a one-way delay detection algorithm, performs one-way delay detection on the minimum link coverage set, obtains a one-way delay detection result, and returns the one-way delay detection result to the primary node module. ;as well as
  • the master node module determines whether to trigger an alarm program according to the set alarm threshold.
  • an embodiment of the present invention provides a network link monitoring device, where the device includes:
  • a master node module configured to send a configuration file to the multiple standby node modules, receive the one-way delay detection result returned by the standby node module, and determine whether to trigger an alarm program according to the set alarm threshold;
  • a plurality of standby node modules configured to receive the configuration file, perform a minimum link coverage algorithm, obtain a minimum link coverage set of the data detection flow between the standby node modules, and perform a one-way delay detection algorithm, Performing one-way delay detection on the minimum link coverage set, obtaining a one-way delay detection result, and returning the one-way delay detection result to the primary node module.
  • an embodiment of the present invention provides a network system, where the system includes:
  • One or more processors are One or more processors;
  • One or more programs the one or more programs being stored on the memory
  • the one or more processors perform the steps of the network link monitoring method according to the one described above when the one or more programs are executed.
  • embodiments of the present invention provide a non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by one or more devices, The device performs the steps of the network link monitoring method according to the above.
  • the various embodiments of the present invention can accurately, efficiently, and comprehensively monitor all links of the entire network, while solving the delay problem caused by the asymmetric link, and quickly locating the failed link.
  • FIG. 1 is a flowchart of a method for monitoring a network link according to an embodiment of the present invention
  • FIG. 2 shows a specific flowchart of the minimum link coverage algorithm in step S1 of FIG. 1;
  • FIG. 3 is a specific flowchart of the one-way delay detection algorithm in step S2 of FIG. 1;
  • FIG. 4 is a specific flowchart of the link abnormal positioning algorithm in step S3 of FIG. 3;
  • FIG. 5 is a structural diagram of a network link monitoring device according to an embodiment of the present invention.
  • Figure 6 shows a block diagram of the master node module 10 shown in Figure 5;
  • FIG. 7 shows a block diagram of the standby node module 20 shown in FIG.
  • FIG. 1 is a flow chart of a method for monitoring a network link according to an embodiment of the present invention. Referring to FIG. 1, in an embodiment of the present invention, the method includes:
  • Step S1 The master node module sends a configuration file to multiple standby node modules.
  • Step S2 The standby node module receives the configuration file, performs a minimum link coverage algorithm on the configuration file, and obtains a minimum link coverage set of the data detection flow between the standby node modules.
  • Step S3 the standby node module performs a one-way delay detection algorithm, performs one-way delay detection on the minimum link coverage set, obtains a one-way delay detection result, and returns the one-way delay detection result.
  • Master node module the standby node module performs a one-way delay detection algorithm, performs one-way delay detection on the minimum link coverage set, obtains a one-way delay detection result, and returns the one-way delay detection result.
  • step S4 the master node module determines whether to trigger an alarm program according to the set alarm threshold.
  • the master node module receives the configuration file from the user (including: the detection period of the data detection flow, the source room and the destination room list of the detection, the alarm threshold, etc.), and sends the configuration files to each through an HTTP connection.
  • Standby node module receives the detection result returned by the standby node module, and determines whether to trigger the alarm program according to the preset alarm threshold of the user, and simultaneously displays the detection result on the front end through the web server.
  • the user first sends a configuration file to the standby node module through the equipment room to be monitored, and then reloads the main node module, and the master node module automatically delivers the configuration file to each standby node module.
  • each standby node module After receiving the new configuration file, each standby node module will automatically update its configuration file in time, perform periodic detection according to the new configuration file, and then return the detection result to the master node module.
  • the master node module summarizes the detection results and displays them on the front end, and judges whether to trigger the alarm program according to the alarm threshold of the preset value.
  • the method of issuing an alarm program may include an email alarm or a short message alarm.
  • FIG. 2 shows a specific flow chart of the minimum link coverage algorithm in step S1 of FIG. 1.
  • the minimum link coverage algorithm includes:
  • Step S11 input address information of the standby node, and calculate a total number of sub-links that generate all links between the standby nodes;
  • Step S12 constructing a data probe stream to probe all the links
  • Step S13 calculating a link coverage ratio of the sub-link, and when the link coverage ratio is greater than a set link coverage threshold, the sub-link is included in the probe flow set;
  • Step S14 marking the detected sub-links in the total number of sub-links to obtain the current link coverage rate.
  • the current link coverage exceeds the coverage rate threshold and outputting the detection flow set
  • the The detected stream set is the least link coverage set of the data probe stream between the standby node modules.
  • the link coverage threshold and the coverage threshold are set according to requirements for link coverage.
  • the link coverage threshold is used to reflect a custom threshold that covers all sub-links with as few probe streams as possible, for example, the link coverage rate of the first probe stream is 100%.
  • the coverage threshold is used to reflect the coverage ratio of the existing probe flow in the entire network. In the ideal situation, when the coverage threshold is 100%, all links on the entire network are covered. Of course, in order to save detection time and detection.
  • the cost of the resource can also be customized for the link coverage threshold, such as 90% or 50%.
  • FIG. 3 shows a specific flow chart of the one-way delay detection algorithm in step S2 of FIG. 1.
  • the one-way delay detection algorithm includes the following steps:
  • Step S21 establishing a control link between the standby nodes and issuing the data detection flow
  • Step S22 recording, at the transmitting end, the sending time of sending the data detection stream, and recording at the receiving end Receiving a reception time of the data detection stream, and calculating a difference between the transmission time and the reception time to obtain a single one-way delay result;
  • Step S23 repeating step S22 for a predetermined number of times to obtain a one-way delay result of the predetermined number of times;
  • Step S24 Perform an average operation on the predetermined number of one-way delay results to obtain a one-way delay detection result.
  • the one-way delay detection algorithm is designed to cope with the delay problem of inaccurate testing of the end and end caused by the inconsistency of the round-trip path caused by the asymmetric network.
  • the difference between the two can obtain the one-way delay data.
  • the measurement can be performed by means of multiple measurements. Of course, as the number of measurement measurements increases, the measurement time is extended and the measurement efficiency is reduced. Therefore, in order to achieve the best measurement.
  • the state, that is, the relatively accurate one-way delay data is obtained with relatively little time, and the predetermined number of measurements can be controlled 3 to 5 times.
  • FIG. 4 shows a specific flow chart of the link abnormality positioning algorithm in step S3 of FIG.
  • the link abnormal positioning algorithm includes:
  • Step S31 input time information, address information, and an alarm threshold of the standby node, perform statistics on the data detection flow between the standby nodes, and generate an abnormal flow set and a normal flow set;
  • Step S32 counting frequency of occurrence of each sub-link in the abnormal flow set in the normal flow set
  • Step S33 filtering out an abnormal sub-link whose frequency is lower than a normal frequency threshold
  • Step S34 sorting the abnormal sub-links according to the frequency from small to large.
  • the link abnormality positioning algorithm is to solve the problem of how to quickly locate a faulty link when there is a problem with the network link. Reduce the problem location time by narrowing the scope of the problem link and positioning the problem link to the sub-link level.
  • the “normal frequency threshold” is the minimum value of the frequency in a normal situation, and can be set by itself (for example, 3 times, 5 times, or 10 times), “screening out the frequency is lower than
  • the abnormal sub-link of the normal frequency threshold refers to the screening of the abnormal sub-links whose frequency is 0 or smaller as measured in step S32, because these links are the most likely cause of network congestion, and then step S34 is performed.
  • the abnormal sub-links are sorted according to the frequency from small to large, which can help the network operation and maintenance personnel to narrow the problem detection range and quickly locate the network fault. In addition, by Therefore, by constructing a data detection stream without using the actual traffic flow, there is no loss of traffic.
  • FIG. 5 is a structural diagram of a network link monitoring device according to an embodiment of the present invention.
  • the device includes:
  • the master node module 10 is configured to send a configuration file to the multiple standby node modules 20, receive the one-way delay detection result returned by the standby node module 20, and determine whether to trigger an alarm program according to the set alarm threshold;
  • the plurality of standby node modules 20 are configured to receive the configuration file, perform a minimum link coverage algorithm, obtain a minimum link coverage set of the data detection flow between the standby node modules, and then perform a one-way delay detection algorithm. Performing one-way delay detection on the minimum link coverage set, obtaining a one-way delay detection result, and returning the one-way delay detection result to the primary node module 10.
  • the master node module 10 may be associated with a web server, and the master node module 10 will receive a configuration file from the user through the web server (including: a detection period of the data probe stream, a source device list and a target room list of the probe, Alarm thresholds, etc., and these configuration files are sent to each standby node module 20 through an HTTP connection.
  • the master node module 10 periodically receives the detection result returned by the standby node module 20, and determines whether to trigger the alarm program according to the preset alarm threshold of the user, and simultaneously displays the detection result on the front end through the web server.
  • FIG. 6 shows a block diagram of the master node module 10 shown in FIG.
  • the master node module 10 includes:
  • the probing and dispatching center module 11 is configured to send a configuration file to the multiple standby node modules
  • the alarm module 12 is configured to receive the one-way delay detection result returned by the standby node module, and determine whether to trigger an alarm program according to the set alarm threshold.
  • the probing and dispatching center module 11 will automatically deliver the configuration file to each standby node module. After receiving the new configuration file, each standby node module will automatically update its configuration file in time and perform the new configuration file according to the new configuration file. The detection is periodically performed, and then the detection result is returned to the detection dispatching center module 11. The detection dispatching center module 11 summarizes the detection results and displays them at the front end, and determines whether the alarm module 12 is triggered according to the alarm threshold of the preset value.
  • FIG. 7 shows a block diagram of the standby node module 20 shown in FIG.
  • the standby node module 20 includes:
  • a link coverage module 21 configured to receive the configuration file, and execute the minimum link coverage algorithm, Obtaining a minimum link coverage set of the data probe stream between the standby node modules;
  • the delay detection module 22 is configured to perform the one-way delay detection algorithm, perform one-way delay detection on the minimum link coverage set, obtain a one-way delay detection result, and detect the one-way delay detection. The result is returned to the master node module.
  • the standby node module may further include: an abnormal positioning module, configured to perform a link abnormal positioning algorithm to locate the failed sub-link when the alarm program is started.
  • the link coverage module 21 and the delay detection module 22 are two parallel and associated modules, and the link coverage module 21 is configured to perform the minimum link coverage on the received data detection flow.
  • the algorithm results in a preferred link coverage scheme that covers as many or all links as possible with as few data detection streams as possible.
  • the link coverage module 21 may be a preliminary module of the delay detection module 22. Specifically, the link coverage module 21 first obtains a minimum chain of the data detection flow between the standby node modules 20. The path coverage set is obtained by the delay detection module 22, and the one-way delay detection result is returned to the master node module. It should be noted that, in order to perform link detection and monitoring more efficiently and reasonably, the link coverage module 21 may only run once and obtain the minimum link coverage set in a certain detection period.
  • the delay detecting module 22 may repeatedly perform the one-way delay detection by using the minimum link coverage set obtained by the link coverage module 21 according to the requirement of the one-way delay detection.
  • the abnormal positioning module is a module that runs only when the alarm module in the master node module starts the alarm program, and can quickly locate the faulty sub-link by performing a link abnormality positioning algorithm. The positioning result is returned to the master node module, which effectively solves the problem of the location of the network abnormal link.
  • an embodiment of the present invention further provides a network system, where the system includes:
  • One or more processors are One or more processors;
  • One or more programs the one or more programs being stored on the memory
  • the one or more processors perform the steps of the network link monitoring method according to the one described above when the one or more programs are executed.
  • For the method for monitoring the network link refer to the foregoing, and no further details are provided herein.
  • embodiments of the present invention also provide a non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by one or more devices, such that The apparatus performs the steps of the network link monitoring method according to the above.
  • a network link monitoring method refer to the foregoing, and no further details are provided here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Mathematical Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种网络链路监控方法和设备以及网络系统和存储介质,所述方法包括:主节点模块向多个备节点模块发出配置文件;所述备节点模块接收所述配置文件,对所述配置文件执行最少链路覆盖算法,得到所述备节点模块间的所述数据探测流的最少链路覆盖集合;所述备节点模块执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回主节点模块;所述主节点模块根据设置的报警阈值判断是否触发报警程序。通过采用本发明可以准确、高效和全面地监控全网所有的链路,同时解决非对称链路带来的时延问题,以及快速地定位出现故障的链路。

Description

一种网络链路监控方法和设备以及网络系统和存储介质
本申请要求于2014年9月2日提交中国专利局、申请号为201410443239.6、发明名称为“一种网络链路监控方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信领域,更为具体而言,涉及一种网络链路监控方法和设备以及网络系统和存储介质。
背景技术
随着网络信息的发展,企业的网络发展也在不断进步。通常,企业的IDC(互联网数据中心,Internet datacenter(简称IDC))网络遍布在多个不同地区,并且,在IDC内部和IDC之间由多层网络设备组成,因此,要对整体网络的链路质量进行监控具有很大挑战。首先,全网的链路数据非常庞大,一般很难监控所有链路;其次,由于非对称网络的往返路径不一致,导致不能精确测试端到端的时延问题;最后,在网络出现故障时,很难快速地定位故障的链路。
现有的网络链路监控技术,通过采集设备端口流量的方式尽管可以确认到某台设备的某个端口在该时刻发生了拥塞,但却不能确认整条业务数据流的拥塞情况,不能测量端到端的时延。一般所采用的traceroute工具在服务器两端进行探测,存在以下问题:不能监控全网所有的链路;不能解决非对称链路带来的问题;也不能快速地定位问题。
发明内容
为了准确、高效和全面地对网络链路进行监控,本发明实施方式提供了一种网络链路监控方法及设备。
一方面,本发明实施方式提供了一种网络链路监控方法,所述方法包括:
主节点模块向多个备节点模块发出配置文件;
所述备节点模块接收所述配置文件,对所述配置文件执行最少链路覆盖算法,得到所述备节点模块间的所述数据探测流的最少链路覆盖集合;
所述备节点模块执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回主节点模块;以及
所述主节点模块根据设置的报警阈值判断是否触发报警程序。
相应的,本发明实施方式提供了一种网络链路监控设备,所述设备包括:
主节点模块,用于向多个备节点模块发出配置文件,接收所述备节点模块返回的所述单向时延探测结果,并根据设置的报警阈值判断是否触发报警程序;以及
多个备节点模块,用于接收所述配置文件,执行最少链路覆盖算法,得到所述备节点模块间的所述数据探测流的最少链路覆盖集合,再执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回所述主节点模块。
并且,本发明实施方式提供了一种网络系统,所述系统包括:
一个或多个处理器;
存储器;
一个或多个程序,所述一个或多个程序存储在所述存储器上;
所述一个或多个处理器执行所述一个或多个程序时进行根据上述一种网络链路监控方法的步骤操作。
另外,本发明实施方式提供了一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个或多个设备执行时,使得所述设备执行根据上述一种网络链路监控方法的步骤操作。
实施本发明的各种实施方式可以准确、高效和全面地监控全网所有的链路,同时解决非对称链路带来的时延问题,以及快速地定位出现故障的链路。
附图说明
图1是根据本发明实施方式的一种网络链路监控方法的流程图;
图2示出了图1的步骤S1中所述最少链路覆盖算法的具体流程图;
图3示出了图1的步骤S2中所述单向时延探测算法的具体流程图;
图4示出了图3的步骤S3中所述链路异常定位算法的具体流程图;
图5是根据本发明实施方式的一种网络链路监控设备的架构图;
图6示出了图5所示的主节点模块10的框图;
图7示出了图5所示的备节点模块20的框图。
具体实施方式
以下结合附图和具体实施方式对本发明的各个方面进行详细阐述。其中,众所周知的模块、单元及其相互之间的连接、链接、通信或操作没有示出或未作详细说明。并且,所描述的特征、架构或功能可在一个或一个以上实施方式中以任何方式组合。本领域技术人员应当理解,下述的各种实施方式只用于举例说明,而非用于限制本发明的保护范围。还可以容易理解,本文所述和附图所示的各实施方式中的模块或单元或处理方式可以按各种不同配置进行组合和设计。
图1是根据本发明实施方式的一种网络链路监控方法的流程图。参见图1,在本发明实施方式中,所述方法包括:
步骤S1,主节点模块向多个备节点模块发出配置文件;
步骤S2,所述备节点模块接收所述配置文件,对所述配置文件执行最少链路覆盖算法,得到所述备节点模块间的所述数据探测流的最少链路覆盖集合;
步骤S3,所述备节点模块执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回主节点模块;以及
步骤S4,所述主节点模块根据设置的报警阈值判断是否触发报警程序。
其中,所述主节点模块将接收来自用户的配置文件(包括:数据探测流的探测周期、探测的源机房和目的机房列表、报警阈值等),并把这些配置文件通过HTTP连接下发至各个备节点模块。主节点模块周期性地接收备节点模块返回的探测结果,并根据用户预设置的报警阈值来决定是否触发报警程序,同时将探测结果通过Web服务器在前端进行展示。在应用过程中,用户先通过要监控的机房向备节点模块下发配置文件,而后再重新加载所述主节点模块,所述主节点模块将自动地将配置文件下发至各个备节点模块,各个备节点模块在收到新的配置文件后,将及时自动更新各自的配置文件,并根据新的配置文件进行周期性探测,而后把探测结果返回主节点模块。主节点模块汇总这些探测结果并在前端进行展示,同时根据预设值的报警阈值判断是否触发报警程序,触 发报警程序的方式可包括邮件报警或短信报警等。
图2示出了图1的步骤S1中所述最少链路覆盖算法的具体流程图。参见图2,在本发明实施方式中,所述最少链路覆盖算法包括:
步骤S11,输入备节点的地址信息,计算生成所述备节点之间全部链路的子链路总集合;
步骤S12,构造数据探测流对所述全部链路进行探路;
步骤S13,计算出所述子链路的链路覆盖率,当所述链路覆盖率大于设定的链路覆盖阈值时,则将所述子链路列入探测流集合;以及
步骤S14,在所述子链路总集合中标记已探测的子链路,得到当前链路覆盖率,当所述当前链路覆盖率超过覆盖率阈值,输出所述探测流集合,则所述已探测流集合即为所述备节点模块间的所述数据探测流的最少链路覆盖集合。
为了实现全网链路监控,如果对所有链路进行遍历的话,需要构造数量庞大的探测流,不仅消耗服务器资源,而且过多的探测数据流也会占用过多的带宽。为了解决这些问题,需要以尽可能少的探测流来监控整个网络的链路,同时占用少量的服务器资源以及带宽,因此,可采用最少流链路覆盖算法来解决。所述最少链路覆盖算法是通过将两台服务器之间的各种链路方式转化为设备与设备之间总的子链路数量来间接地解决该问题。例如,如果某条子链路发生了拥塞,那么流经该子链路的所有链路都会拥塞,这样两台服务器之间的链路覆盖转化为这个两台服务器经过的设备之间的所有子链路的覆盖,从而可以用较少的探测流来监控这两台服务器之间所有可能的路径的链路状况。需要说明的是,所述链路覆盖阈值和所述覆盖率阈值根据对链路覆盖的要求情况进行设置。其中,所述链路覆盖阈值是用来体现以尽可能少的探测流来覆盖所有的子链路的自定义阈值,例如:第一条探测流的链路覆盖率为100%;而所述覆盖率阈值是用来体现现有的探测流在全网网络的覆盖比率,理想状况下当所述覆盖率阈值为100%时表示全网所有链路都覆盖,当然,为了节约探测时间和探测资源的成本,也可对链路覆盖率阈值进行自定义,如90%或50%等。
图3示出了图1的步骤S2中所述单向时延探测算法的具体流程图。参见图3,在本发明实施方式中,所述单向时延探测算法包括如下步骤:
步骤S21,在所述备节点间建立控制链接并发出所述数据探测流;
步骤S22,在发送端记录发送所述数据探测流的发送时刻,在接收端记录接 收所述数据探测流的接收时刻,计算所述发送时刻与所述接收时刻之差得到单次单向时延结果;
步骤S23,重复步骤S22进行预定次数,得到所述预定次数的单向时延结果;以及
步骤S24,对所述预定次数的单向时延结果进行平均值运算,得到单向时延探测结果。
所述单向时延探测算法是为了应对非对称网络带来的往返路径不一致导致的不能精确测试端与端的时延问题。通过在两台服务器之间先建立控制链接,再用linkCover探测出来的数据流进行探测,并记录发送时刻的时间戳和到达时刻的时间戳,两者之差即可得到单向时延数据。为了得到较为精确的单向时延数据,可通过多次测量取平均值的方式进行测量,当然,随着测量测量次数的增多也会延长测量时间和降低测量效率,因此,为了达到最佳测量状态,即用相对少的时间得到相对精确的单向时延数据,可将测量的预定次数控制在3至5次。
图4示出了图3的步骤S3中所述链路异常定位算法的具体流程图。参见图4,在本发明实施方式中,所述链路异常定位算法包括:
步骤S31,输入所述备节点的时间信息、地址信息和报警阈值,对所述备节点间的所述数据探测流进行统计,并生成异常流集合和正常流集合;
步骤S32,统计所述异常流集合中每条子链路在所述正常流集合中出现的频率;
步骤S33,筛选出所述频率低于正常频率阈值的异常子链路;以及
步骤S34,将所述异常子链路按照所述频率从小到大排序。
所述链路异常定位算法是为了解决当网络链路出现问题时如何快速地定位故障链路的问题。通过缩小问题链路排查的范围,将问题链路定位到子链路级别的方式来减少问题定位的时间。其中所述步骤S33中,所述“正常频率阈值”为正常情况下所述频率的最小值,可以自行设定(例如:3次、5次或10次),“筛选出所述频率低于正常频率阈值的异常子链路”是指筛选出步骤S32所统计的频率为0或者较小的异常子链路,因为这些链路是最有可能造成网络拥塞的原因,而后执行步骤S34,将所述异常子链路按照所述频率从小到大排序,则可帮助网络运维人员缩小问题排查范围,从而快速地定位网络故障。另外,由 于是通过构造数据探测流的方式,而不使用实际的业务流,所以不会带来业务流量的损失。
图5是根据本发明实施方式的一种网络链路监控设备的架构图。参见图5,在本发明实施方式中,所述设备包括:
主节点模块10,用于向多个备节点模块20发出配置文件,接收所述备节点模块20返回的所述单向时延探测结果,并根据设置的报警阈值判断是否触发报警程序;以及
多个备节点模块20,用于接收所述配置文件,执行最少链路覆盖算法,得到所述备节点模块间的所述数据探测流的最少链路覆盖集合,再执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回所述主节点模块10。
其中,所述主节点模块10可与Web服务器相关联,所述主节点模块10将通过Web服务器接收来自用户的配置文件(包括:数据探测流的探测周期、探测的源机房和目的机房列表、报警阈值等),并把这些配置文件通过HTTP连接下发至各个备节点模块20。主节点模块10周期性地接收备节点模块20返回的探测结果,并根据用户预设置的报警阈值来决定是否触发报警程序,同时将探测结果通过Web服务器在前端进行展示。
图6示出了图5所示的主节点模块10的框图。参见图6,在本发明实施方式中,所述主节点模块10包括:
探测调度中心模块11,用于向所述多个备节点模块发出配置文件;以及
报警模块12,用于接收所述备节点模块返回的所述单向时延探测结果,并根据设置的报警阈值判断是否触发报警程序。
其中,探测调度中心模块11将自动地将配置文件下发至各个备节点模块,各个备节点模块在收到新的配置文件后,将及时自动更新各自的配置文件,并根据新的配置文件进行周期性探测,而后再把探测结果返回探测调度中心模块11。探测调度中心模块11汇总这些探测结果并在前端进行展示,同时根据预设值的报警阈值判断是否触发报警模块12。
图7示出了图5所示的备节点模块20的框图。参见图1,在本发明实施方式中,所述备节点模块20包括:
链路覆盖模块21,用于接收所述配置文件,执行所述最少链路覆盖算法, 得到所述备节点模块间的所述数据探测流的最少链路覆盖集合;以及
时延探测模块22,用于执行所述单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回所述主节点模块。
所述备节点模块还可以包括:异常定位模块,用于当所述报警程序启动时,执行链路异常定位算法对出现故障的子链路进行定位。
其中,所述链路覆盖模块21与所述时延探测模块22是两个并列且关联的模块,所述链路覆盖模块21是用于将接收到的数据探测流执行所述最少链路覆盖算法,从而得到以尽量少的数据探测流覆盖尽量多或全部链路的优选链路覆盖方案。所述链路覆盖模块21可以是所述时延探测模块22的预备模块,具体而言,先通过所述链路覆盖模块21得到所述备节点模块20间的所述数据探测流的最少链路覆盖集合,再由时延探测模块22得到单向时延探测结果,并将所述单向时延探测结果返回所述主节点模块。需要说明的是,为了更加高效、合理进行链路探测和监控,在某探测周期内,所述链路覆盖模块21可仅运行1次并得到所述最少链路覆盖集合,然后,所述时延探测模块22可根据单向时延探测的需要,多次重复利用所述链路覆盖模块21得到的所述最少链路覆盖集合进行单向时延探测。另外,所述异常定位模块是在所述主节点模块中的报警模块在启动报警程序的情况下才运行的模块,它通过执行链路异常定位算法可快速对出现故障的子链路进行定位,并将定位结果返回主节点模块,有效地解决了网络异常链路的定位问题。
并且,本发明实施方式还提供了一种网络系统,所述系统包括:
一个或多个处理器;
存储器;
一个或多个程序,所述一个或多个程序存储在所述存储器上;
所述一个或多个处理器执行所述一个或多个程序时进行根据上述一种网络链路监控方法的步骤操作。关于所述一种网络链路监控方法请参见前文所述,在此不再赘述。
另外,本发明实施方式还提供了一种非易失性计算机存储介质,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个或多个设备执行时,使得所述设备执行根据上述一种网络链路监控方法的步骤操作。 关于一种网络链路监控方法请参见前文所述,在此不再赘述。
采用本发明提供的方案,可以准确、高效和全面地监控全网所有的链路,同时解决非对称链路带来的时延问题,以及快速地定位出现故障的链路。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明可借助软件结合硬件平台的方式来实现,当然也可以全部通过硬件来实施。基于这样的理解,本发明的技术方案对背景技术做出贡献的全部或者部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,智能手机或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。
本发明说明书中使用的术语和措辞仅仅为了举例说明,并不意味构成限定。本领域技术人员应当理解,在不脱离所公开的实施方式的基本原理的前提下,对上述实施方式中的各细节可进行各种变化。因此,本发明的范围只由权利要求确定,在权利要求中,除非另有说明,所有的术语应按最宽泛合理的意思进行理解。

Claims (12)

  1. 一种网络链路监控方法,其特征在于,所述方法包括:
    主节点模块向多个备节点模块发出配置文件;
    所述备节点模块接收所述配置文件,对所述配置文件执行最少链路覆盖算法,得到所述备节点模块间的数据探测流的最少链路覆盖集合;
    所述备节点模块执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回主节点模块;以及
    所述主节点模块根据设置的报警阈值判断是否触发报警程序。
  2. 如权利要求1所述的方法,其特征在于,所述最少链路覆盖算法包括:
    输入备节点的地址信息,计算生成所述备节点之间全部链路的子链路总集合;
    构造数据探测流对所述全部链路进行探路;
    计算出所述子链路的链路覆盖率,当所述链路覆盖率大于设定的链路覆盖阈值时,则将所述子链路列入探测流集合;以及
    在所述子链路总集合中标记已探测的子链路,得到当前链路覆盖率,当所述当前链路覆盖率超过覆盖率阈值,输出所述探测流集合,则所述已探测流集合即为所述备节点模块间的所述数据探测流的最少链路覆盖集合。
  3. 如权利要求2所述的方法,其特征在于,所述链路覆盖阈值和所述覆盖率阈值根据对链路覆盖的要求情况进行设置。
  4. 如权利要求1至3任意一项所述的方法,其特征在于,所述单向时延探测算法包括如下步骤:
    S21,在所述备节点间建立控制链接并发出所述数据探测流;
    S22,在发送端记录发送所述数据探测流的发送时刻,在接收端记录接收所述数据探测流的接收时刻,计算所述发送时刻与所述接收时刻之差得到单次单向时延结果;
    S23,重复步骤S22进行预定次数,得到所述预定次数的单向时延结果;以及
    S24,对所述预定次数的单向时延结果进行平均值运算,得到单向时延探测结果。
  5. 如权利要求1至4任意一项所述的方法,其特征在于,当所述报警程序启动时,则执行链路异常定位算法对出现故障的子链路进行定位。
  6. 如权利要求5所述的方法,其特征在于,所述链路异常定位算法包括:
    输入所述备节点的时间信息、地址信息和报警阈值,对所述备节点间的所述数据探测流进行统计,并生成异常流集合和正常流集合;
    统计所述异常流集合中每条子链路在所述正常流集合中出现的频率;
    筛选出所述频率低于正常频率阈值的异常子链路;以及
    将所述异常子链路按照所述频率从小到大排序。
  7. 一种网络链路监控设备,其特征在于,所述设备包括:
    主节点模块,用于向多个备节点模块发出配置文件,接收所述备节点模块返回的单向时延探测结果,并根据设置的报警阈值判断是否触发报警程序;以及
    多个备节点模块,用于接收所述配置文件,执行最少链路覆盖算法,得到所述备节点模块间的数据探测流的最少链路覆盖集合,再执行单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回所述主节点模块。
  8. 如权利要求7所述的设备,其特征在于,所述主节点模块包括:
    探测调度中心模块,用于向所述多个备节点模块发出配置文件;以及
    报警模块,用于接收所述备节点模块返回的所述单向时延探测结果,并根据设置的报警阈值判断是否触发报警程序。
  9. 如权利要求7或8所述的设备,其特征在于,所述备节点模块包括:
    链路覆盖模块,用于接收所述配置文件,执行所述最少链路覆盖算法,得到所述备节点模块间的所述数据探测流的最少链路覆盖集合;以及
    时延探测模块,用于执行所述单向时延探测算法,对所述最少链路覆盖集合进行单向时延探测,得到单向时延探测结果,并将所述单向时延探测结果返回所述主节点模块。
  10. 如权利要求7至9任意一项所述的设备,其特征在于,所述备节点模块还包括:
    异常定位模块,用于当所述报警程序启动时,执行链路异常定位算法对出现故障的子链路进行定位。
  11. 一种网络系统,其特征在于,所述系统包括:
    一个或多个处理器;
    存储器;
    一个或多个程序,所述一个或多个程序存储在所述存储器上;
    所述一个或多个处理器执行所述一个或多个程序时进行根据权利要求1至6中任意一项所述的操作。
  12. 一种非易失性计算机存储介质,其特征在于,所述计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个或多个设备执行时,使得所述设备执行根据权利要求1至6中任意一项所述的操作。
PCT/CN2014/093557 2014-09-02 2014-12-11 一种网络链路监控方法和设备以及网络系统和存储介质 WO2016033897A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/902,308 US10033592B2 (en) 2014-09-02 2014-12-11 Method and system for monitoring network link and storage medium therefor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410443239.6A CN104202190B (zh) 2014-09-02 2014-09-02 一种网络链路监控方法及设备
CN201410443239.6 2014-09-02

Publications (1)

Publication Number Publication Date
WO2016033897A1 true WO2016033897A1 (zh) 2016-03-10

Family

ID=52087420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/093557 WO2016033897A1 (zh) 2014-09-02 2014-12-11 一种网络链路监控方法和设备以及网络系统和存储介质

Country Status (3)

Country Link
US (1) US10033592B2 (zh)
CN (1) CN104202190B (zh)
WO (1) WO2016033897A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262939A (zh) * 2019-05-14 2019-09-20 苏宁金融服务(上海)有限公司 算法模型运行监控方法、装置、计算机设备和存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104202190B (zh) * 2014-09-02 2017-07-25 百度在线网络技术(北京)有限公司 一种网络链路监控方法及设备
JP2018032886A (ja) * 2015-01-08 2018-03-01 シャープ株式会社 端末装置、基地局装置、無線通信方法及び集積回路
CN105763251B (zh) * 2016-04-19 2018-08-14 广东睿江云计算股份有限公司 一种光纤链路质量监控的方法及装置
CN108965010A (zh) * 2018-07-19 2018-12-07 郑州云海信息技术有限公司 一种网络链路流控异常监控方法、系统及主机总线适配器
CN112783677A (zh) * 2019-11-04 2021-05-11 北京京东尚科信息技术有限公司 一种服务异常的监控方法和装置
CN111239584A (zh) * 2019-12-31 2020-06-05 视航机器人(佛山)有限公司 应用于无人叉车的主板测试装置、系统及方法
CN111817911B (zh) * 2020-06-23 2023-08-08 腾讯科技(深圳)有限公司 一种探测网络质量的方法、装置、计算设备及存储介质
CN112491489B (zh) * 2020-11-27 2022-07-29 清华大学 基于带内遥测进行时间同步的方法、装置和系统
CN115237727B (zh) * 2022-09-21 2022-12-02 云账户技术(天津)有限公司 最拥堵子链路的确定方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035696A (zh) * 2010-12-22 2011-04-27 中国工商银行股份有限公司 一种网站访问性能监测方法、装置及系统
CN102468991A (zh) * 2010-11-15 2012-05-23 北京意科通信技术有限责任公司 一种信息传输方法和系统
US20120162633A1 (en) * 2010-12-22 2012-06-28 Roberts Richard D Systems and methods for determining position using light sources
CN103384376A (zh) * 2012-05-04 2013-11-06 华为技术有限公司 链路覆盖问题确定方法、装置与系统
CN104202190A (zh) * 2014-09-02 2014-12-10 百度在线网络技术(北京)有限公司 一种网络链路监控方法及设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6842430B1 (en) * 1996-10-16 2005-01-11 Koninklijke Philips Electronics N.V. Method for configuring and routing data within a wireless multihop network and a wireless network for implementing the same
CN101030895A (zh) * 2006-03-01 2007-09-05 华为技术有限公司 移动代理系统和约束网络层析成像方法
CN101051954A (zh) * 2007-04-26 2007-10-10 天津大学 无线传感器网络算法测试系统及测试方法
US8543682B2 (en) * 2007-05-02 2013-09-24 Spirent Communications, Inc. Quality of experience indicator for network diagnosis
US8264953B2 (en) * 2007-09-06 2012-09-11 Harris Stratex Networks, Inc. Resilient data communications with physical layer link aggregation, extended failure detection and load balancing
US8787190B2 (en) * 2011-11-02 2014-07-22 Tt Government Solutions, Inc. Method, system, network nodes, routers and program for bandwidth estimation in multi-hop networks
EP2838264A4 (en) 2012-04-23 2016-01-06 Samsung Electronics Co Ltd METHOD FOR ENCODING MULTIVUE VIDEO USING A MULTIVUE VIDEO PREDICTION REFERENCE LIST AND DEVICE THEREFOR, AND METHOD FOR DECODING MULTIVUE VIDEO USING A MULTIVUE VIDEO PREDICTION REFERENCE LIST AND DEVICE THEREOF
US9503344B2 (en) * 2014-07-25 2016-11-22 Telefonaktiebolaget L M Ericsson (Publ) Data path performance measurement using network traffic in a software defined network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102468991A (zh) * 2010-11-15 2012-05-23 北京意科通信技术有限责任公司 一种信息传输方法和系统
CN102035696A (zh) * 2010-12-22 2011-04-27 中国工商银行股份有限公司 一种网站访问性能监测方法、装置及系统
US20120162633A1 (en) * 2010-12-22 2012-06-28 Roberts Richard D Systems and methods for determining position using light sources
CN103384376A (zh) * 2012-05-04 2013-11-06 华为技术有限公司 链路覆盖问题确定方法、装置与系统
CN104202190A (zh) * 2014-09-02 2014-12-10 百度在线网络技术(北京)有限公司 一种网络链路监控方法及设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262939A (zh) * 2019-05-14 2019-09-20 苏宁金融服务(上海)有限公司 算法模型运行监控方法、装置、计算机设备和存储介质
CN110262939B (zh) * 2019-05-14 2023-07-21 苏宁金融服务(上海)有限公司 算法模型运行监控方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
US10033592B2 (en) 2018-07-24
US20160226714A1 (en) 2016-08-04
CN104202190A (zh) 2014-12-10
CN104202190B (zh) 2017-07-25

Similar Documents

Publication Publication Date Title
WO2016033897A1 (zh) 一种网络链路监控方法和设备以及网络系统和存储介质
US11502932B2 (en) Indirect testing using impairment rules
US11695648B2 (en) Method for supporting service level agreement monitoring in a software defined network and corresponding software defined network
US9100299B2 (en) Detecting error conditions in standby links
US7990887B2 (en) Sampling test of network performance
CA3090099A1 (en) Systems and methods for broadband communication link performance monitoring
Jin et al. Zooming in on wide-area latencies to a global cloud provider
JP2020532216A (ja) 遅延ベースの伝送経路制御方法、ネットワークコントローラ、およびシステム
JP2009049708A (ja) ネットワーク障害情報収集装置、システム、方法及びプログラム
JP2014068283A (ja) ネットワーク障害検出システムおよびネットワーク障害検出装置
US20160308709A1 (en) Method and system for restoring qos degradations in mpls networks
CN107005437A (zh) 网络断层扫描的方法和装置
EP3295612B1 (en) Uplink performance management
US20100110918A1 (en) Method and apparatus for performance monitoring in a communications network
CN108259364B (zh) 一种网络拥塞确定方法及装置
Huang et al. Practical issues with using network tomography for fault diagnosis
JP2008283621A (ja) ネットワーク輻輳状況監視装置、ネットワーク輻輳状況監視方法及びプログラム
US9124489B2 (en) Method, apparatus and system for setting a size of an event correlation time window
Cunha et al. Measurement methods for fast and accurate blackhole identification with binary tomography
US10608913B2 (en) Methods, systems, and computer readable media for conducting and validating network route convergence testing
Vuletić et al. Localization of network service performance degradation in multi-tenant networks
Merindol et al. A fine-grained multi-source measurement platform correlating routing transitions with packet losses
Huang et al. Overlay Routing Over an Uncooperative Underlay
US10462032B2 (en) Probing a network
CA3149650A1 (en) Methods and system for adaptive measurements applied to real time performance monitoring in a packet network

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 14902308

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14901434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14901434

Country of ref document: EP

Kind code of ref document: A1