WO2021218582A1 - 网络性能监控方法、网络设备及存储介质 - Google Patents

网络性能监控方法、网络设备及存储介质 Download PDF

Info

Publication number
WO2021218582A1
WO2021218582A1 PCT/CN2021/085816 CN2021085816W WO2021218582A1 WO 2021218582 A1 WO2021218582 A1 WO 2021218582A1 CN 2021085816 W CN2021085816 W CN 2021085816W WO 2021218582 A1 WO2021218582 A1 WO 2021218582A1
Authority
WO
WIPO (PCT)
Prior art keywords
network performance
network
time period
abnormal
threshold
Prior art date
Application number
PCT/CN2021/085816
Other languages
English (en)
French (fr)
Inventor
李发远
胡永健
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2022566251A priority Critical patent/JP2023523472A/ja
Priority to EP21796281.0A priority patent/EP4131856A4/en
Publication of WO2021218582A1 publication Critical patent/WO2021218582A1/zh
Priority to US17/976,491 priority patent/US20230041307A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/022Capturing of monitoring data by sampling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/087Jitter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0847Transmission error
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/20Arrangements for monitoring or testing data switching networks the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV

Definitions

  • This application relates to the field of network technology, and in particular to a network performance monitoring method, network equipment, and storage medium.
  • network equipment will act as a network performance monitoring node. In the process of forwarding data streams, it will periodically collect local network performance data according to a preset collection cycle. Whenever a network device collects network performance data, the central processing unit (CPU) of the network device will report the collected network performance data to the Operation Support Systems (OSS) so that the OSS can process the data Analysis and presentation.
  • OSS Operation Support Systems
  • the embodiments of the present application provide a network performance monitoring method, network equipment, and storage medium, which can reduce the dependence of the monitoring process of the network performance on the performance of the main control CPU.
  • the technical solution is as follows.
  • a network performance monitoring method In the first aspect, a network performance monitoring method is provided.
  • the forwarding plane samples network performance data based on a first time period, and records the number of abnormal network performances, where the network performance data obtained during each sampling When the preset conditions are met, it is recorded as a network performance abnormality.
  • the first time period is the sampling period for the forwarding plane to collect network performance data; the control plane determines that the number of network performance abnormalities in the second time period is greater than the first time period.
  • a threshold the duration of the second time period is greater than the duration of the first time period; the control plane generates an alarm.
  • the forwarding plane samples the network performance data in a fine-grained time period and records the number of network performance abnormalities.
  • the control plane with a coarse-grained time period, when the number of network performance abnormalities recorded by the forwarding plane is greater than the threshold Generate an alert.
  • the control plane since the control plane does not need to report and collect all network performance data, it greatly reduces the amount of data that the control plane needs to report.
  • it solves the problem of massive data reporting leading to the main control
  • the problem of CPU overload reduces the dependence of network performance monitoring on the performance of the device's main control CPU.
  • it solves the problem of consuming a lot of bandwidth resources due to massive data reporting, reduces the dependence of network performance monitoring on bandwidth resources, and helps to meet the needs of a large number of performance monitoring nodes deployed on the existing network.
  • the magnitude of the first time period is on the order of milliseconds, and the magnitude of the second time period is at least on the order of seconds.
  • the forwarding plane performs network performance data sampling and network performance abnormality records based on millisecond-level cycles, it is helpful to realize millisecond-level performance monitoring, thereby satisfying customers' demands for millisecond-level performance monitoring.
  • the control plane decides the generation of alarms based on at least a second-level cycle, it can reduce the dependence of millisecond-level performance monitoring on bandwidth resources and the dependence of millisecond-level performance monitoring on the performance of the main control CPU.
  • the preset condition includes: the value of the network performance data obtained in each sampling is greater than or equal to a second threshold.
  • the forwarding plane can determine whether it is recorded as a network performance abnormality by comparing the value of the network performance with the threshold value, which is relatively simple to implement, and therefore has strong practicability.
  • the recording the number of network performance abnormalities includes: recording the number of network performance abnormalities corresponding to each abnormality level in a plurality of abnormality levels, and the plurality of abnormality levels respectively correspond to a plurality of preset conditions, wherein each When the network performance data obtained in the second sampling meets the preset conditions corresponding to the abnormal level, it is recorded as the network performance abnormality corresponding to the one abnormal level.
  • the forwarding plane monitors the network performance of multiple abnormal levels by separately recording the number of abnormal network performance of each abnormal level, so that the network performance monitoring function is more refined and flexibility is improved.
  • the alarm when the alarm carries an abnormal level, it helps to clarify which abnormal level of network performance anomaly is currently occurring, so as to help users understand the severity of the current network performance anomaly.
  • control plane determining that the number of times the network performance is abnormal in the second time period is greater than a first threshold includes: the control plane determining the network performance corresponding to the multiple abnormal levels in the second time period The number of abnormalities is greater than the first threshold; the control plane generating an alarm includes: the control plane generating alarm information for indicating the highest abnormality level among the multiple abnormality levels.
  • the control plane when multiple abnormality levels meet the conditions for triggering an alarm, the control plane only needs to generate the alarm information of the highest abnormality level, while suppressing the alarm information of the low abnormality level, thereby reducing the required alarm information. In order to avoid generating too much alarm information to interfere with users.
  • the preset condition corresponding to the abnormality level includes: the value of the network performance data obtained in each sampling is greater than or equal to the third threshold corresponding to the abnormality level, and the higher the abnormality level is, the abnormality level The corresponding third threshold is higher.
  • the forwarding plane will record the number of times corresponding to different abnormal levels for network performance data of different values, so that network performance
  • the monitoring function is more refined and flexibility is improved.
  • the network performance data includes the following or at least one of: delay, packet loss, jitter, bandwidth, transmission rate, bit error, and packet error.
  • the method further includes: determining the network performance parameter in the second time period by the forwarding plane according to the network performance data obtained in each sampling; and the control plane obtaining the network performance parameter.
  • This optional method not only meets the requirement of statistical network performance parameters, but also reduces the processing overhead brought by statistical network performance parameters to the control plane because the task of statistical network performance parameters is offloaded to the forwarding plane.
  • the network performance parameter includes at least one of the following: maximum delay; minimum delay; average delay; packet loss rate; jitter; bandwidth; transmission rate; bit error rate; and packet error rate.
  • the method further includes: the control plane sends the alarm to the control management device.
  • control-oriented control management device reports an alarm, it can notify the operator of abnormal performance in the network in time, which helps the operator to adjust the network situation in a timely manner, and thus contributes to the abnormal network performance.
  • the problem is solved in a timely manner, so as to avoid affecting the user experience.
  • the method further includes: if the number of times of abnormal network performance in a plurality of consecutive second time periods after the second time period is less than all According to the first threshold, the control plane cancels the alarm.
  • control plane after the control plane reports an alarm, it cancels the alarm based on the number of abnormal network performance in multiple consecutive cycles, which can prevent the generated alarm from remaining for a long time, and at the same time, by canceling the alarm, the operator can be notified of the network performance It has been normal, and the fault that caused the abnormal network performance has been recovered.
  • a network performance monitoring method In a second aspect, a network performance monitoring method is provided.
  • the control plane obtains the number of abnormal network performance from the forwarding plane, wherein each time the network performance data obtained by sampling satisfies a preset condition, it is recorded as a network performance.
  • Abnormal performance the first time period is a sampling period for the forwarding plane to collect network performance data; the control plane determines that the number of abnormal network performance in the second time period is greater than a first threshold; the second time The duration of the period is greater than the duration of the first time period; the control plane generates an alarm.
  • a network device in a third aspect, includes: a main control board and an interface board, the main control board includes a control plane corresponding method for executing the first aspect or any one of the optional methods of the first aspect Module, the interface board includes a module for executing the method corresponding to the forwarding plane in the first aspect or any one of the optional methods of the first aspect.
  • a network device in a fourth aspect, includes a main control board and an interface board.
  • the main control board includes: a first processor and a first memory.
  • the interface board includes: a second processor, a second memory, and an interface card. The main control board and the interface board are coupled.
  • the first memory may be used to store program code
  • the first processor is used to call the program code in the first memory to perform the following operations: sample network performance data based on the first time period, and record the number of abnormal network performance, where each When the network performance data obtained by the second sampling meets the preset condition, it is recorded as a network performance abnormality, and the first time period is the sampling period for the forwarding plane to collect the network performance data.
  • the second memory may be used to store program code
  • the second processor is used to call the program code in the second memory to trigger the interface card to perform the following operations: determine that the number of abnormal network performance in the second time period is greater than the first threshold; The duration of the second time period is greater than the duration of the first time period; an alarm is generated.
  • an inter-process communication protocol (IPC) channel is established between the main control board and the interface board, and the main control board and the interface board communicate through the IPC channel.
  • IPC inter-process communication protocol
  • a computer-readable storage medium stores at least one instruction, and the instruction is read by a processor to enable the forwarding plane and the control plane to execute the first aspect or any one of the first aspects.
  • the network performance monitoring method provided by the optional method.
  • a computer program product runs on a network device, the forwarding plane and the control plane of the network device execute the above-mentioned first aspect or any one of the optional methods provided in the first aspect Network performance monitoring method.
  • a chip is provided, when the chip runs on a network device, the forwarding plane and the control plane of the network device execute the network performance monitoring provided in the first aspect or any one of the optional methods of the first aspect. method.
  • FIG. 1 is a schematic diagram of a system architecture 100 provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a network performance monitoring method 200 provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a network device 300 provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a network device 400 provided by an embodiment of the present application.
  • the network performance monitoring method provided by the embodiments of the present application can be applied to the scenario of millisecond-level network performance monitoring.
  • the following is a brief introduction to the millisecond-level network performance monitoring scenarios.
  • the current millisecond-level performance monitoring method extends the second-level performance monitoring method.
  • Each performance monitoring node (such as the forwarding plane of the device) collects network performance data according to the millisecond sampling period, and the millisecond-level network performance data of each sample is collected. Real-time report to the main control of the device, or package and report the millisecond-level network performance data of each sampling to the main control of the device according to a certain period.
  • the device master uses telemetry (Telemetry) or simple network management protocol (Simple Network Management Protocol, SNMP) to pass all sampled network performance data through the Data Communication Network (DCN) channel between the devices or the out-of-band DCN channel, Report to OSS, and OSS analyzes and presents network performance data.
  • DCN Data Communication Network
  • the embodiment of the present application provides a network performance monitoring method.
  • the sampling period is set to the millisecond level
  • the dependence of the millisecond-level network performance monitoring on the performance of the main control CPU of the device can be reduced, and at the same time, it can satisfy The customer's request for millisecond-level network performance monitoring.
  • it can reduce the dependence of millisecond-level network performance monitoring on DCN bandwidth resources, and meet the requirements of a large number of millisecond-level performance monitoring nodes deployed on the existing network.
  • the system architecture 100 is an example of a network performance monitoring system.
  • the system architecture 100 includes at least one network device and a control and management device. Each of the at least one network device is connected to the control and management device through a network (such as DCN).
  • a network such as DCN
  • the network equipment in the system architecture 100 includes, but is not limited to, access network equipment, convergence network equipment, or core network equipment.
  • network equipment includes, but is not limited to, Packet Transport Network (PTN) equipment, Agile Transport Network (ATN), Optical Switch Network (OSN), routers or switches; or, network
  • PTN Packet Transport Network
  • ATN Agile Transport Network
  • OSN Optical Switch Network
  • routers or switches or, network
  • the device is another type of device that supports performance monitoring, for example, a device that supports millisecond-level performance monitoring. This embodiment does not limit the type of network device. For example, referring to FIG.
  • the network equipment is an access network device 101, a convergence network device 102, a convergence network device 103, a backbone convergence network device 104, a backbone convergence network device 105, a core network device 106, or a core network device 107.
  • all or part of the network devices in the system architecture 100 serve as performance monitoring nodes for executing the following method 200.
  • which network devices in the system architecture 100 serve as performance monitoring nodes are specified by the user. For example, when a certain network device needs to be deployed as a performance monitoring node, and the user enables the performance monitoring function of the network device, the network device will execute the following method 200 under the trigger of the enable instruction.
  • the control management device in the system architecture 100 includes but is not limited to a network management device or a controller.
  • the network management device is, for example, the OSS110 in FIG. 1.
  • the controller is, for example, the SDN controller (SDN controller) in Software Defined Network (English: Software Defined Network, abbreviation: SDN), and the network function virtualization management in Network Function Virtualization (English: Network Function Virtualization, abbreviation: NFV) Device (English: Network Functions Virtualisation Manager, abbreviated as: VNFM), etc.
  • SDN controller Software Defined Network
  • NFV Network Function Virtualization
  • VNFM Network Functions Virtualisation Manager
  • the communication manner between the network device and the control and management device in the system architecture 100 includes multiple implementation manners.
  • the network equipment and the control and management equipment communicate through Telemetry or SNMP.
  • Telemetry and SNMP are optional ways to implement communication.
  • the network device and the control and management device communicate based on the Network Configuration (NETCONF) protocol.
  • NETCONF Network Configuration
  • the system architecture 100 has been introduced above, and the method 200 is used to exemplarily introduce the process of the method for monitoring network performance based on the system architecture provided above.
  • FIG. 2 is a flowchart of a network performance monitoring method 200 provided by an embodiment of the present application.
  • the method 200 includes S201 to S205.
  • the method 200 is executed by a network device in the system architecture 100, specifically executed by a forwarding plane and a control plane in the same network device.
  • the forwarding plane is used to undertake the processing work corresponding to S201
  • the control plane is used to undertake S202 to S205. Corresponding processing work.
  • the forwarding plane samples the network performance data based on the first time period, and records the number of abnormal network performance.
  • the first time period is the sampling period for the forwarding plane to collect network performance data.
  • the forwarding plane samples the network performance data every first time period, so as to record the number of abnormal network performance according to the network performance data obtained by the sampling.
  • the granularity or time length of the first time period includes various situations.
  • the magnitude of the first time period is on the order of milliseconds. For example, if the first time period is 1 millisecond, the forwarding side will collect network performance data every millisecond.
  • the forwarding plane samples network performance data through a millisecond-level sampling period, which helps to achieve millisecond-level performance monitoring. Exemplarily, network performance data is represented by the letter p.
  • the forwarding plane will sample N network performance data, which are respectively p 1 , p 2 , p 3 ... p n , where p i represents the ith Network performance data collected in milliseconds, i is a positive integer greater than or equal to 1 and less than or equal to n.
  • the network performance data is data collected internally on the forwarding plane and will not be reported to the OSS for presentation to users.
  • the network performance data is used to indicate network performance, for example, to indicate the forwarding performance of the forwarding plane.
  • the network performance data includes the following or at least one of: delay, packet loss, jitter, bandwidth, transmission rate, bit error, and packet error.
  • the monitoring object of network performance includes at least one of a physical port, a tunnel, a pseudo wire, or a virtual interface.
  • the network performance data includes the network performance data of the physical port, the network performance data of the tunnel, and the network of the pseudo wire. At least one of performance data or network performance data of a virtual interface.
  • different components of the forwarding plane are responsible for sampling the network performance data of different monitoring objects. For example, the network performance data of the physical port is sampled by the physical interface card, and the network performance data of the tunnel and pseudowire is sampled by the NP.
  • How the forwarding plane records the number of abnormal network performance includes multiple implementation methods. For example, after each sampling of the network performance data obtained by the forwarding plane, it is judged whether the network performance data obtained by the sampling meets the preset condition, and each time the network performance data obtained by the sampling meets the preset condition, the forwarding plane records a network performance abnormality.
  • the setting of preset conditions includes a variety of implementation methods.
  • the preset condition includes: the value of the network performance data obtained in each sampling is greater than or equal to the second threshold.
  • the second threshold may be referred to as a performance over-limit threshold or a performance degradation threshold.
  • the number of abnormal network performance is also called the number of limit violations.
  • the second threshold can be set according to requirements or actual network conditions.
  • the second threshold includes but is not limited to at least one of a delay threshold, a packet loss threshold, a jitter threshold, a bandwidth threshold, a transmission rate threshold, a code error threshold, and a packet error threshold.
  • the network performance data is the transmission rate
  • the second threshold is 70%.
  • the second threshold is preset by the user when the network performance monitoring function is enabled, or the second threshold is a default value. Taking the first time period of one millisecond as an example, the forwarding plane compares the network performance data collected every millisecond with the second threshold. If the network performance data collected every millisecond is greater than or equal to the second threshold, a network performance abnormality is recorded.
  • the network performance monitoring may have multiple abnormality levels, and the forwarding plane will record the number of network performance abnormalities corresponding to each abnormality level in the multiple abnormality levels.
  • multiple abnormality levels correspond to multiple preset conditions, and the preset conditions corresponding to different abnormality levels can be different.
  • After each sampling of the network performance data obtained by the forwarding plane it will determine whether the network performance data meets the multiple preset conditions. Set conditions. For one abnormality level among multiple abnormality levels, when the network performance data obtained by sampling each time meets the preset condition corresponding to the abnormality level, the forwarding plane will record the network performance abnormality corresponding to the abnormality level once.
  • the preset condition corresponding to the abnormal level includes: the value of the network performance data obtained in each sampling is greater than or equal to the third threshold corresponding to the abnormal level.
  • the plurality of third thresholds and the plurality of abnormality levels may have a one-to-one correspondence.
  • the third threshold and the second threshold mentioned above may be the same or different.
  • the third threshold corresponding to each abnormal level is preset by the user when the network performance monitoring function is enabled, or the third threshold corresponding to each abnormal level is a default value.
  • the third threshold includes but is not limited to at least one of a delay threshold, a packet loss threshold, a jitter threshold, a bandwidth threshold, a transmission rate threshold, a code error threshold, and a packet error threshold.
  • the higher the abnormality level the higher the third threshold corresponding to the abnormality level.
  • the network performance data is the transmission rate of the physical port
  • the third threshold is the transmission rate threshold of the physical port.
  • the transmission rate threshold corresponding to the low abnormality level is 70%
  • the transmission rate threshold corresponding to the high abnormality level is 85%.
  • the third threshold is represented by the letter M
  • the network performance data is represented by the letter p
  • the number of abnormal network performance is represented by the letter mum.
  • the number of exceptions M i indicates an abnormal level i corresponding to a third threshold value
  • mum i represents the number of abnormal abnormality corresponding network performance class i, i is greater than or equal to 1 and less than or equal to a positive integer n.
  • the forwarding plane After the forwarding plane acquires network performance data p k by sampling at the kth millisecond, it compares the network performance data p k with the threshold M 1 to the threshold M n respectively. If the threshold M i ⁇ network performance data p k ⁇ threshold Mi +1 , then the forwarding plane adds one to the value of the number of times mum i. In other words, for two adjacent anomaly levels among multiple anomaly levels, if the value of the network performance data obtained by the current sampling is greater than the threshold corresponding to the previous anomaly level and less than the threshold corresponding to the latter anomaly level, then the previous The number of network performance abnormalities corresponding to an abnormal level will be cumulatively increased by one.
  • the forwarding plane not only records the number of abnormal network performance according to the network performance data, but also determines the network performance parameters in the second time period according to the network performance data obtained in each sampling.
  • the network performance parameter includes but is not limited to at least one of the maximum value of the network performance data in the second time period, the minimum value of the network performance data in the second time period, or the average value of the network performance data in the second time period.
  • network performance parameters include at least one of maximum delay, minimum delay, average delay, packet loss rate, jitter, bandwidth, transmission rate, bit error rate, or packet error rate item.
  • the second time period is indicated by the letter T
  • the maximum value of the network performance data in the period T is indicated by the letter Max.
  • the forwarding plane is sampled every millisecond in the period T to obtain the network After the performance data, compare the value of the network performance data collected this time with the Max value. If the value of the network performance data collected this time is greater than the recorded Max value, the forwarding plane will update the recorded Max value to this time. The value of the collected network performance data. If the value of the network performance data collected this time is less than or equal to the recorded Max value, the forwarding plane keeps the recorded Max value unchanged, thereby obtaining the maximum value of the network performance data in the period T .
  • the second time period is indicated by the letter T
  • the minimum value of the network performance data in the period T is indicated by the letter Min.
  • the forwarding plane is sampled every millisecond in the period T to obtain the network After the performance data, compare the value of the network performance data collected this time with the Min value. If the value of the network performance data collected this time is less than the recorded Min value, the forwarding plane will update the recorded Min value to this time. The value of the collected network performance data. If the value of the network performance data collected this time is greater than or equal to the recorded Min value, the forwarding plane keeps the recorded Min value unchanged, thereby obtaining the maximum value of the network performance data in the period T .
  • How to calculate the average value of the network performance data in the second time period includes multiple implementation methods.
  • the second time period is represented by the letter T
  • the average value of network performance data in the period T is represented by the letter Avg.
  • the forwarding face is sampled every millisecond in the period T to the network
  • the value of the performance data is averaged to obtain the average value Avg of the network performance data in the period T.
  • the control plane determines that the number of abnormal network performance in the second time period is greater than a first threshold.
  • the duration of the second time period is greater than the duration of the first time period.
  • the granularity or time length of the second time period includes various situations.
  • the first time period is on the order of milliseconds
  • the second time period is on the order of at least seconds.
  • the duration of the second time period is greater than or equal to one second, for example, the second time period
  • the period is 1 second, 10 seconds, 30 seconds, 1 minute, 5 minutes, 15 minutes, 30 minutes, or 1 hour.
  • the duration of the second time period is preset by the user when the network performance monitoring function is enabled, or the duration of the second time period is a default value, which is, for example, 30 seconds.
  • the first threshold may be referred to as an alarm threshold.
  • the first threshold is preset by the user when the network performance monitoring function is enabled, or the first threshold is a default value. Every second time period, the control plane compares the number of abnormal network performance in the second time period with the first threshold. If the number of abnormal network performance recorded by the forwarding plane is greater than the first threshold, the control plane generates an alarm.
  • the second time period is represented by the letter T
  • the number of abnormal network performance is represented by the letter mum
  • the first threshold is represented by the letter Alam-num.
  • the control plane compares mum with Alam-num. If mum is greater than Alam-num, That is, when the number of abnormal network performances in the second time period is greater than the first threshold and reaches the alarm threshold, the control plane reports that an alarm is generated.
  • the period of data sampling and number recording of the forwarding plane is fine-grained, and the reporting period of the control plane is coarse-grained, so the forwarding plane and the control plane are coarse-grained.
  • control plane can obtain the number of abnormal network performance from the forwarding plane.
  • how to obtain the number of abnormal network performance includes multiple implementation methods, and the following is an example of implementation manner 1 to implementation manner 2.
  • Implementation method 1 The number of times that the control plane actively reads abnormal network performance.
  • the forwarding plane saves the number of abnormal network performance in the memory, and the control plane reads the number of abnormal network performance from the memory.
  • the memory used to store the number of abnormal network performance includes many situations.
  • the memory is a memory on an interface board where the forwarding plane is located, such as a register on a physical interface card.
  • the memory is a memory in an interface board (such as a main control board) where the control plane is located, and this embodiment does not limit the type of the memory.
  • Implementation mode 2 The number of times that the control plane receives and forwards the abnormal network performance reported on the plane.
  • the forwarding plane after the forwarding plane records the number of abnormal network performance, the forwarding plane sends the number of abnormal network performance to the control plane, and the control plane receives the number of abnormal network performance.
  • the control plane generates an alarm.
  • the alarms generated by the control plane include a variety of situations, which will be illustrated below with examples from situations A to B.
  • the alarm generated by the control plane includes an alarm indication signal.
  • the alarm indication signal can be output in the form of data such as the on or off or flashing frequency of the alarm indicator, and alarm audio.
  • the alarm indication signal can be an alarm.
  • the alarm generated by the control plane includes alarm information
  • the alarm information is used to indicate that the number of abnormal network performance is greater than the first threshold.
  • the content of the alarm information includes a variety of situations.
  • the alarm information includes at least one of the alarm type, alarm source information, or time stamp. These types of information will be specifically described below.
  • the alarm type includes at least one of abnormal delay, abnormal packet loss, abnormal jitter, abnormal bandwidth, abnormal transmission rate, abnormal code error or abnormal packet.
  • the alarm type of the alarm information is abnormal delay
  • the alarm information indicates that the number of abnormal delays is greater than the first threshold.
  • the alarm type of the alarm information is abnormal transmission rate
  • the alarm information indicates that the number of abnormal transmission rates is greater than the first threshold. For example, the number of times the port rate is too slow is greater than the first threshold, and the same applies to other alarm types.
  • the alarm source information is used to indicate the network device that generated the alarm information.
  • the alarm source information is, for example, the name of the network device where the control plane is located or the Internet protocol (IP) address of the network device where the control plane is located, by carrying the alarm source in the alarm information
  • IP Internet protocol
  • the timestamp is used to indicate the time point when the number of abnormal network performance is greater than the first threshold. For example, when the control plane determines that the number of abnormal network performance in the second time period is greater than the first threshold, the current time point can be written in the alarm information Timestamp. By carrying a timestamp in the alarm information, it can indicate when the control plane has monitored abnormal network performance.
  • the alarm information further includes the abnormality level.
  • the alarm information of different abnormal levels is different, so as to indicate which abnormal level of network performance abnormal event has occurred.
  • the alarm information includes the alarm name, and the alarm names in the alarm information of different abnormal levels are different.
  • the alarm information includes alarm parameters, and the alarm parameters in the alarm information of different abnormal levels are different.
  • the control plane compares the number of abnormal network performances corresponding to multiple abnormal levels in the second time period with the first threshold. For one abnormal level among multiple abnormal levels, if the control plane determines that the number of network performance abnormalities corresponding to the abnormal level in the second time period is greater than the first threshold, the control plane generates alarm information indicating the abnormal level .
  • the second time period is represented by the letter T
  • the number of abnormal network performance corresponding to the abnormal level n is represented by the letter num n
  • the first threshold is represented by the letter Alam-num.
  • the control plane compares num n with Alam-num, If the control surface determines that num n is greater than Alam-num within T, the control surface generates Alam n .
  • Alam n represents alarm information used to indicate anomaly level n.
  • the control plane determines that the number of network performance abnormalities corresponding to multiple abnormal levels in the second time period is greater than the first threshold, the control plane generates alarm information for indicating the highest abnormal level among the multiple abnormal levels.
  • the first threshold is represented by the letter Alam-num. If the number of abnormal network performance corresponding to abnormal level 1 is greater than Alam-num, the number of abnormal network performance corresponding to abnormal level 2 is also greater than Alam-num, and the network corresponding to abnormal level 3 The number of abnormal performance is also greater than Alam-num, then Alam 3 is generated on the control surface. Among them, Alam 3 represents alarm information used to indicate anomaly level 3.
  • the control plane can only report the alarm information of the highest abnormality level to the upper OSS, and suppress the reporting of the alarm information of the low abnormality level. In this way, the number of reported alarm information can be reduced, and excessive alarm information can be prevented from causing interference to users.
  • the control sends an alarm to the control management device.
  • the control plane reports alarms to the control and management device through the DCN channel between the devices or the out-of-band DCN channel through the Telemetry or SNMP protocol.
  • the network device where the control plane is located is, for example, the access network device 101.
  • the access network device 101 After the access network device 101 generates an alarm, it sends an alarm to the OSS 110.
  • the control and management device receives the alarm, it can analyze and present the alarm.
  • the control plane can promptly notify the operator of abnormal performance events in the network, thereby helping the operator to adjust the network situation in time, and thus help the problem of abnormal network performance to be solved in time. In order to avoid affecting the user experience.
  • the control plane after the control plane obtains the network performance parameters in the second time period, the control plane also sends the network performance parameters to the control and management device, for example, sending the maximum value of the network performance data in the second time period, and in the second time period. The minimum value of the network performance data and the average value of the network performance data in the second time period.
  • the control plane reports network performance parameters to the control and management device through the DCN channel between the devices or the out-of-band DCN channel through the Telemetry or SNMP protocol. After the control and management device receives the network performance parameters, it can analyze and present the network performance parameters.
  • the timing or trigger condition for reporting the network performance parameters specifically includes a variety of situations, which are illustrated below through Case I to Case II.
  • Case I The control plane reports network performance parameters based on the second time period. In other words, every second time period, the control-oriented control management device sends the network performance parameters once.
  • the reporting of network performance parameters does not depend on the reporting of alarms. For example, when the number of abnormal network performance in the second time period is greater than, less than or equal to the first threshold, The control plane will report the network performance parameters in the second time period.
  • Case II The control plane reports network performance parameters when reporting an alarm. In other words, when the control plane determines that the network performance parameter is greater than the first threshold in the second time period, the control plane sends an alarm to the control management device and the network performance parameter in the second time period.
  • the second time period may be the reporting period for the control plane to send network performance parameters.
  • S204 is an optional step, and in other embodiments, the control plane does not execute S204.
  • the control plane prompts the user to monitor abnormal network performance by outputting an alarm.
  • control plane also reports the number of abnormal network performance. Specifically, after the control plane obtains the number of abnormal network performance, it also sends the number of abnormal network performance to the control and management device, the number of abnormal network performance received by the control and management device, and analyzes and presents the number of abnormal network performance. In the case that the network performance monitoring has multiple abnormality levels, optionally, the control sends the number of network performance abnormalities corresponding to each of the multiple abnormality levels to the control management device.
  • the network performance data includes at least one of delay, packet loss, jitter, bandwidth, transmission rate, error code, and error packet
  • control the number of abnormal transmission delays and packet loss optionally, control the number of abnormal transmission delays and packet loss to the control management device At least one of the number of abnormalities, the number of jitter abnormalities, the number of bandwidth abnormalities, the number of transmission rate abnormalities, the number of error code abnormalities, and the number of error packet abnormalities.
  • the timing or trigger condition for reporting the number of abnormal network performances specifically includes a variety of situations, which are illustrated below through case a to case b.
  • Case a The control plane reports the number of abnormal network performance based on the second time period. In other words, every second time period, the control-oriented control management device sends the number of abnormal network performances once.
  • the reporting of the number of abnormal network performance does not depend on the reporting of the alarm, for example, when the number of abnormal network performance in the second time period is greater than, less than or equal to the first threshold Down, the control plane will report the number of abnormal network performance in the second time period.
  • Case b The number of times that the control plane reported abnormal network performance when reporting an alarm. In other words, when the control plane determines that the number of abnormal network performance in the second time period is greater than the first threshold, it controls the number of alarms sent to the control management device and the number of abnormal network performance in the second time period.
  • the control plane supports the function of canceling alarms. Specifically, after the control plane determines that the number of network performance abnormalities in the second time period is greater than the first threshold and generates an alarm, the forwarding plane will continue to sample network performance data based on the first time period and continue to record the number of network performance abnormalities The control plane will continue to determine whether the number of abnormal network performance in each second time period after the second time period is less than the first threshold according to the number of abnormal network performance recorded by the forwarding plane. If the number of consecutive second time periods after the second time period in which the network performance is abnormal is less than the first threshold, the control plane cancels the alarm.
  • the way to cancel the alarm is to send a notification message indicating the disappearance of the alarm to the control and management device.
  • the second time period designated by the letter T after the control period T i face report an alarm Alam n
  • N successive clock cycles after the cycle if an abnormal T i T i + 1 T i + N to the periodic network performance record If the number of times is less than the threshold Alam-num, the control plane reports that the alarm Alam n disappears.
  • N is preset by the user when the network performance monitoring function is enabled, or N is a default value. For example, N is 3.
  • the control plane cancels the alarm according to the number of abnormal network performance in multiple consecutive cycles, which can avoid the long-term residual alarms that have been generated. At the same time, by canceling the alarm, it can notify the operator that the network performance has been normal, causing network performance abnormality The failure has been restored.
  • S205 is an optional step, and in other embodiments, the control plane does not execute S205.
  • the network performance data is sampled in fine-grained time periods through the forwarding plane, and the number of abnormal network performance is recorded, and the network performance abnormalities recorded on the forwarding plane through the control plane in coarse-grained time periods When the number of times is greater than the threshold, an alarm is generated.
  • the control plane since the control plane does not need to report and collect all network performance data, it greatly reduces the amount of data that the control plane needs to report.
  • it solves the problem of massive data reporting leading to the main control
  • the problem of CPU overload reduces the dependence of network performance monitoring on the performance of the device's main control CPU.
  • it solves the problem of consuming a lot of bandwidth resources due to massive data reporting, reduces the dependence of network performance monitoring on bandwidth resources, and helps to meet the needs of a large number of performance monitoring nodes deployed on the existing network.
  • the method 200 of the embodiment of the present application is described above, and the network device of the embodiment of the present application is described below. It should be understood that the network device described below has any function of the forwarding plane and the control plane in the method 200 described above.
  • Fig. 3 is a schematic structural diagram of a network device 300 provided by an embodiment of the present application.
  • the network device 300 includes: a sampling module 301 for performing the sampling step in S201; and a recording module 302 for Perform the recording step in S201; the determination module 303 is used to perform S202; the generation module 304 is used to perform S203.
  • the network device 300 further includes: a sending module, configured to perform S204.
  • the network device 300 further includes: a cancellation module, configured to perform S205.
  • sampling module 301 and the recording module 302 correspond to the forwarding plane in the above method 200, and the sampling module 301 and the recording module 302 are used to implement various steps and methods implemented by the forwarding plane in the method 200.
  • the sampling module 301 and the recording module 302 belong to the same concept as the above-mentioned forwarding plane.
  • the specific implementation process please refer to the corresponding process of the forwarding plane in the method 200, which will not be repeated here.
  • the determining module 303, the generating module 304, the sending module, and the canceling module correspond to the control plane in the above method 200
  • the determining module 303, the generating module 304, the sending module, and the canceling module are used to implement the control plane implementation in the method 200.
  • the determining module 303, the generating module 304, the sending module, and the canceling module belong to the same concept as the above-mentioned control plane.
  • each functional module in the network device 300 is implemented by software.
  • the sampling module 301 and the recording module 302 are virtual modules generated after the processor of the forwarding plane reads the program code.
  • the determining module 303, the generating module 304, the sending module, and the canceling module are virtual modules generated after the processor of the control plane reads the program code.
  • the network device 300 monitors the network performance, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, i.e. the forwarding plane or the control plane The internal structure is divided into different functional modules to complete all or part of the functions described above.
  • the embodiments of the present application also provide a network device 400.
  • the hardware structure of the network device 400 is introduced below.
  • the network device 400 corresponds to the forwarding plane and the control plane in the method 200.
  • the hardware, modules, and other operations and/or functions in the network device 400 are implemented to implement the forwarding plane and the control plane in the method 200, respectively.
  • the steps and methods as for the detailed process of how the network device 400 monitors the network performance, the specific details can be referred to the above-mentioned method 200.
  • the steps of the method 200 are completed by hardware integrated logic circuits in the processor of the network device 400 or instructions in the form of software.
  • the steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the network device 400 corresponds to the network device 300 in the foregoing virtual device embodiment, and each functional module in the network device 300 is implemented by software of the network device 400.
  • the functional modules included in the network device 300 are generated after the processor of the network device 400 reads the program code stored in the memory.
  • FIG. 4 shows a schematic structural diagram of a network device provided by an exemplary embodiment of the present application.
  • the network device 400 includes a main control board 410 and an interface board 430.
  • the main control board is also called the main processing unit (MPU) or the route processor card.
  • the main control board 410 is used to control and manage the various components in the network device 400, including route calculation and device management. , Equipment maintenance, protocol processing functions.
  • the main control board 410 includes: a central processing unit 411 and a memory 412.
  • the interface board 430 is also called a line processing unit (LPU), a line card (line card), or a service board.
  • the interface board 430 is used to provide various service interfaces and implement data packet forwarding.
  • Service interfaces include, but are not limited to, Ethernet interfaces, POS (Packet over SONET/SDH) interfaces, etc.
  • the Ethernet interfaces are, for example, Flexible Ethernet Clients (Flexible Ethernet Clients, FlexE Clients).
  • the interface board 430 includes a central processor 431, a network processor 432, a forwarding entry memory 434, and a physical interface card (PIC) 433.
  • PIC physical interface card
  • the central processing unit 431 on the interface board 430 is used to control and manage the interface board 430 and communicate with the central processing unit 411 on the main control board 410.
  • the network processor 432 is used to implement message forwarding processing.
  • the form of the network processor 432 may be a forwarding chip.
  • the network processor 432 is configured to forward the received message based on the forwarding table stored in the forwarding table entry memory 434, and if the destination address of the message is the address of the network device 400, the message is sent to the CPU (such as the central processor 411) processing; if the destination address of the message is not the address of the network device 400, the next hop and outbound interface corresponding to the destination address are found from the forwarding table according to the destination address, and the message is forwarded to The outgoing interface corresponding to the destination address.
  • the processing of the upstream message includes: the processing of the incoming interface of the message, the lookup of the forwarding table; the processing of the downstream message: the lookup of the forwarding table, and so on.
  • the physical interface card 433 is used to implement the docking function of the physical layer, the original traffic enters the interface board 430 from this, and the processed packets are sent from the physical interface card 433.
  • the physical interface card 433 is also called a daughter card, which can be installed on the interface board 430, and is responsible for converting the photoelectric signal into a message, checking the validity of the message, and forwarding it to the network processor 432 for processing.
  • the central processing unit can also perform the functions of the network processor 432, such as realizing software forwarding based on a general-purpose CPU, so that the network processor 432 is not required in the physical interface card 433.
  • the network device 400 includes multiple interface boards.
  • the network device 400 further includes an interface board 440.
  • the interface board 440 includes a central processing unit 441, a network processor 442, a forwarding entry memory 444, and a physical interface card 443.
  • the network device 400 further includes a switching network board 420.
  • the switch fabric unit 420 may also be referred to as a switch fabric unit (SFU).
  • SFU switch fabric unit
  • the switching network board 420 is used to complete data exchange between the interface boards.
  • the interface board 430 and the interface board 440 may communicate with each other through the switching network board 420.
  • the main control board 410 and the interface board 430 are coupled.
  • the main control board 410, the interface board 430, the interface board 440, and the switching network board 420 are connected to the system backplane through a system bus to achieve intercommunication.
  • an inter-process communication protocol (IPC) channel is established between the main control board 410 and the interface board 430, and the main control board 410 and the interface board 430 communicate through the IPC channel.
  • IPC inter-process communication protocol
  • the network device 400 includes a control plane and a forwarding plane.
  • the control plane includes a main control board 410 and a central processing unit 431.
  • the forwarding plane includes components that perform forwarding, such as the forwarding entry memory 434, the physical interface card 433, and network processing. ⁇ 432.
  • the control plane performs functions such as routers, generation of forwarding tables, processing of signaling and protocol messages, configuration and maintenance of the status of the equipment, and other functions.
  • the control plane issues the generated forwarding tables to the forwarding plane.
  • the network processor 432 is based on the control plane.
  • the issued forwarding table looks up and forwards the message received by the physical interface card 433.
  • the forwarding table issued by the control plane may be stored in the forwarding entry storage 434.
  • the interface board 430 or the interface board 440 is used to perform steps corresponding to the forwarding plane.
  • the physical interface card 433 samples the network performance data of the physical ports in the physical interface card 433 based on the first time period, and records the number of abnormal network performances. The number of times is stored in the forwarding entry storage 434.
  • the network processor 432 samples the network performance data of the tunnel based on the first time period, records the number of abnormal network performance, and saves the number of abnormal network performance to the forwarding table entry storage. 434.
  • the main control board 410 is used to execute the steps corresponding to the control plane.
  • the central processing unit 431 reads the number of abnormal network performance from the forwarding entry memory 434, and determines that the number of abnormal network performance in the second time period is greater than the first threshold, and the central processing unit 431 generates an alarm.
  • sampling module 301 and the recording module 302 in the network device 300 are equivalent to the interface board 430 or the interface board 440 in the network device 400; the determining module 303, the generating module 304, the sending module, and the canceling module in the network device 300 may be equivalent ⁇ 410 ⁇ On the main control board 410.
  • main control boards there may be one or more main control boards, and when there are more than one, it may include the main main control board and the standby main control board.
  • the switching network board may not exist, or there may be one or more. When there are more than one, the load sharing and redundant backup can be realized together. Under the centralized forwarding architecture, the network equipment may not need to switch the network board, and the interface board undertakes the processing function of the business data of the entire system.
  • the network device can have at least one switching network board, and data exchange between multiple interface boards is realized through the switching network board, providing large-capacity data exchange and processing capabilities. Therefore, the data access and processing capabilities of network equipment with a distributed architecture are greater than those with a centralized architecture.
  • the form of the network device may also have only one board, that is, there is no switching network board, and the functions of the interface board and the main control board are integrated on the one board.
  • the central processing unit and the main control board on the interface board The central processing unit on the board can be combined into a central processing unit on this board, and perform the functions of the superposition of the two.
  • the data exchange and processing capacity of this form of equipment is low (for example, low-end switches or routers and other networks) equipment).
  • the specific architecture used depends on the specific networking deployment scenario, and there is no restriction here.
  • the foregoing forwarding plane and control plane may also be implemented using computer program products.
  • an embodiment of the present application provides a computer program product, when the computer program product runs on a network device, the forwarding plane and the control plane of the network device respectively execute the network performance monitoring method in the above method 200.
  • forwarding plane and the control plane of the above-mentioned various product forms respectively have any functions of the forwarding plane and the control plane in the above-mentioned method 200, and will not be repeated here.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
  • the computer program product includes one or more computer program instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program instructions can be passed from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a digital video disc (DVD), or a semiconductor medium (for example, a solid state hard disk).
  • first, second and other words in this application are used to distinguish the same or similar items that have basically the same function and function. It should be understood that there is no logic or sequence between “first” and “second” The dependence relationship on the above does not limit the number and execution order. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another.
  • the first threshold value may be referred to as the second threshold value, and similarly, the second threshold value may be referred to as the first threshold value. Both the first threshold and the second threshold may be thresholds, and in some cases, may be separate and different thresholds.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种网络性能监控方法、网络设备及存储介质,属于网络技术领域。转发面以细粒度的时间周期对网络性能数据进行采样,并记录网络性能异常的次数,通过控制面以粗粒度的时间周期,在转发面记录的网络性能异常的次数大于阈值时生成告警。在满足细粒度对网络性能监控的需求的基础上,由于控制面无需上报采集的所有的网络性能数据,因而极大地减少了控制面所需上报的数据量,一方面解决了海量数据上报导致主控CPU过载的问题,减少网络性能监控对设备主控CPU性能的依赖。另一方面解决了海量数据上报导致占用大量带宽资源的问题,减少网络性能监控对带宽资源的依赖,有助于满足现网大量部署性能监控节点的需求。

Description

网络性能监控方法、网络设备及存储介质
本申请要求于2020年04月29日提交的申请号为202010359259.0、发明名称为“网络性能监控方法、网络设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及网络技术领域,特别涉及一种网络性能监控方法、网络设备及存储介质。
背景技术
随着用户对网络性能的要求逐步提高,需要对网络性能进行监控,以便运营商在网络性能下降时及时对网络进行调整。
时下,网络设备会充当网络性能监控节点,在转发数据流的过程中,会按照预先设定的采集周期,周期性采集本端的网络性能数据。每当网络设备采集一次网络性能数据后,网络设备的主控中央处理器(Central Processing Unit,CPU)会将采集的网络性能数据上报至操作支持系统(Operation Support Systems,OSS),以便OSS进行数据分析以及呈现。
采用上述方法时,由于上报的网络性能数据的数据量巨大,对网络设备的主控CPU要求非常高,容易造成网络设备的主控CPU过载。
发明内容
本申请实施例提供了一种网络性能监控方法、网络设备及存储介质,能够减少网络性能的监控过程对主控CPU性能的依赖。所述技术方案如下。
第一方面,提供了一种网络性能监控方法,在该方法中,转发面基于第一时间周期对网络性能数据进行采样,并记录网络性能异常的次数,其中,每次采样获取的网络性能数据满足预设条件时,记录为一次网络性能异常,所述第一时间周期为所述转发面采集网络性能数据的采样周期;控制面确定在第二时间周期内所述网络性能异常的次数大于第一阈值;所述第二时间周期的时长大于所述第一时间周期的时长;所述控制面生成告警。
通过该方法,转发面以细粒度的时间周期对网络性能数据进行采样,并记录网络性能异常的次数,通过控制面以粗粒度的时间周期,在转发面记录的网络性能异常的次数大于阈值时生成告警。在满足细粒度对网络性能监控的需求的基础上,由于控制面无需上报采集所有的网络性能数据,因而极大地减少了控制面所需上报的数据量,一方面解决了海量数据上报导致主控CPU过载的问题,减少网络性能监控对设备主控CPU性能的依赖。另一方面解决了海量数据上报导致占用大量带宽资源的问题,减少网络性能监控对带宽资源的依赖,有助于满足现网大量部署性能监控节点的需求。
可选地,所述第一时间周期的数量级为毫秒级,所述第二时间周期的数量级至少为秒级。
通过这种可选方式,由于转发面基于毫秒级的周期进行网络性能数据的采样和网络性能异常次数的记录,有助于实现毫秒级的性能监控,从而满足客户对毫秒级性能监控的诉求。 并且,由于控制面基于至少为秒级的周期来决策告警的生成,能够减少毫秒级性能监控对带宽资源的依赖以及毫秒级性能监控对主控CPU性能的依赖。
可选地,所述预设条件包括:每次采样获取的网络性能数据的值大于或者等于第二阈值。
通过这种可选方式,转发面通过对网络性能的值和阈值进行比较,即可判定是否记录为一次网络性能异常,实现起来比较简单,因此实用性强。
可选地,所述记录网络性能异常的次数,包括:记录多个异常等级中每个异常等级对应的网络性能异常的次数,所述多个异常等级分别对应多个预设条件,其中,每次采样获取的网络性能数据满足异常等级对应的预设条件时,记录为一次异常等级对应的网络性能异常。
通过这种可选方式,转发面通过分别记录每个异常等级的网络性能异常的次数,从而监控多个异常等级的网络性能,使得网络性能监控的功能更精细化,提高了灵活性。尤其是,在告警携带异常等级的情况下,有助于明确当前出现了哪种异常等级的网络性能异常,从而帮助用户了解当前网络性能异常的严重程度。
可选地,所述控制面确定在第二时间周期内所述网络性能异常的次数大于第一阈值,包括:所述控制面确定在第二时间周期内所述多个异常等级对应的网络性能异常的次数均大于所述第一阈值;所述控制面生成告警,包括:所述控制面生成用于指示所述多个异常等级中最高异常等级的告警信息。
通过这种可选方式,在多个异常等级均满足触发告警的条件时,控制面只需生成最高异常等级的告警信息,而抑制低异常等级的告警信息,从而减少了所需生成的告警信息的数量,避免生成的告警信息过多对用户产生干扰。
可选地,所述异常等级对应的预设条件,包括:每次采样获取的网络性能数据的值大于或者等于所述异常等级对应的第三阈值,所述异常等级越高,所述异常等级对应的第三阈值越高。
通过这种可选方式,由于按照异常等级的高低,为网络性能数据的值分别设置了不同的阈值,使得转发面会为不同取值的网络性能数据分别记录不同异常等级对应的次数,使得网络性能监控的功能更精细化,提高了灵活性。
可选地,所述网络性能数据包括以下或至少一项:时延,丢包,抖动,带宽,传输速率,误码,错包。
通过这种可选方式,有助于明确具体哪一种维度的网络性能出现异常,并且能够支持多种维度的网络性能的监控,使得网络性能的监控更为全面,满足更多的应用场景。
可选地,所述方法还包括:根据每次采样获取的网络性能数据,所述转发面确定在所述第二时间周期内的网络性能参数;所述控制面获取所述网络性能参数。
通过这种可选方式,不仅满足了统计网络性能参数的需求,并且由于将统计网络性能参数的任务卸载至转发面上,减少了统计网络性能参数为控制面带来的处理开销。
可选地,所述网络性能参数包括以下至少一项:最大时延;最小时延;平均时延;丢包率;抖动;带宽;传输速率;误码率;以及错包率。
通过这种可选方式,能够满足统计多种维度的网络性能参数的需求,使得网络性能的监控更为全面,满足更多的应用场景。
可选地,所述控制面生成告警之后,所述方法还包括:所述控制面向控制管理设备发送所述告警。
通过这种可选方式,由于控制面向控制管理设备上报告警,能够及时通知运营商网络中出现了性能异常事件,从而有助于运营商及时对网络情况进行调整,因而有助于网络性能异常的问题得到及时解决,从而避免影响用户体验。
可选地,所述控制面向控制管理设备发送所述告警之后,所述方法还包括:若所述第二时间周期之后的连续多个第二时间周期内所述网络性能异常的次数均小于所述第一阈值,所述控制面取消所述告警。
通过这种可选方式,由于控制面在上报告警之后,根据连续多个周期内的网络性能异常次数取消告警,能够避免已产生的告警长期残留,同时通过取消告警,能够通知运营商网络性能已经正常,引起网络性能异常的故障已经得到恢复。
第二方面,提供了一种网络性能监控方法,在该方法中,控制面从转发面获取网络性能异常的次数,其中,每次采样获取的网络性能数据满足预设条件时,记录为一次网络性能异常,所述第一时间周期为所述转发面采集网络性能数据的采样周期;所述控制面确定在第二时间周期内所述网络性能异常的次数大于第一阈值;所述第二时间周期的时长大于所述第一时间周期的时长;所述控制面生成告警。
第三方面,提供一种网络设备,所述网络设备包括:主控板和接口板,主控板包括用于执行第一方面或第一方面任一种可选方式中控制面对应方法的模块,接口板包括用于执行第一方面或第一方面任一种可选方式中转发面对应方法的模块。
第四方面,提供了一种网络设备,该网络设备包括:主控板和接口板。主控板包括:第一处理器和第一存储器。接口板包括:第二处理器、第二存储器和接口卡。主控板和接口板耦合。
第一存储器可以用于存储程序代码,第一处理器用于调用第一存储器中的程序代码执行如下操作:基于第一时间周期对网络性能数据进行采样,并记录网络性能异常的次数,其中,每次采样获取的网络性能数据满足预设条件时,记录为一次网络性能异常,所述第一时间周期为所述转发面采集网络性能数据的采样周期。
第二存储器可以用于存储程序代码,第二处理器用于调用第二存储器中的程序代码,触发接口卡执行如下操作:确定在第二时间周期内所述网络性能异常的次数大于第一阈值;所述第二时间周期的时长大于所述第一时间周期的时长;生成告警。
在一种可能的实现方式中,主控板和接口板之间建立进程间通信协议(inter-process communication,IPC)通道,主控板和接口板之间通过IPC通道进行通信。
第五方面,提供了一种计算机可读存储介质,该存储介质中存储有至少一条指令,该指令由处理器读取以使转发面和控制面执行上述第一方面或第一方面任一种可选方式所提供的网络性能监控方法。
第六方面,提供了一种计算机程序产品,当该计算机程序产品在网络设备上运行时,使得网络设备的转发面和控制面执行上述第一方面或第一方面任一种可选方式所提供的网络性 能监控方法。
第七方面,提供了一种芯片,当该芯片在网络设备上运行时,使得网络设备的转发面和控制面执行上述第一方面或第一方面任一种可选方式所提供的网络性能监控方法。
附图说明
图1是本申请实施例提供的一种系统架构100的示意图;
图2是本申请实施例提供的一种网络性能监控方法200的流程图;
图3是本申请实施例提供的一种网络设备300的结构示意图;
图4是本申请实施例提供的一种网络设备400的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
以下示例性介绍本申请的应用场景。
本申请实施例提供的网络性能监控方法能够应用在毫秒级网络性能监控的场景。下面分别对毫秒级网络性能监控的场景进行简单的介绍。
随着人们生活水平的提高,用户对网络体验效果要求也逐步提高,这就要求运营商对网络性能进行实时的监控,运营商可以根据网络性能及时对网络进行调整,避免由于网络流量不均衡、网络拥塞等网络性能异常的情况造成网络性能下降,导致用户投诉。然而当前实际网络中,网络性能监控都是秒级或分钟级的,由于秒级或分钟级的统计周期过长,经常出现网络性能长期的统计结果表现稳定、但偶尔的瞬时突发流量导致流量超过业务的可保证带宽或链路带宽的情况,进而导致业务会有少量丢包。为了能更好的反映网络的实时流量情况,就需要更精准的毫秒级网络性能监控功能。
目前的毫秒级性能监控方法延用了秒级的性能监控方法,由各个性能监控节点(如设备的转发面)按照毫秒级的采样周期采集网络性能数据,将每次采样的毫秒级网络性能数据实时上报给设备主控,或按一定的周期将每次采样的毫秒级网络性能数据打包上报给设备主控。设备主控通过遥测(Telemetry)或简单网络管理协议(Simple Network Management Protocol,SNMP)把采样的所有网络性能数据通过设备之间的数据通信网(Data Communication Network,DCN)通道或带外DCN通道,上报给OSS,OSS再把网络性能数据分析呈现。
而采用上述方式时,由于是毫秒级的数据采样和上报,网络性能数据的数据量巨大,因此对主控CPU的要求非常高,容易造成CPU过载。并且,由于采集的网络性能数据的数据量巨大,数据上报需要占用大量的DCN带宽资源,导致现网几乎不可能大量部署。
有鉴于此,本申请实施例提供了一种网络性能监控方法,在将采样周期设置为毫秒级的情况下,能够减少毫秒级的网络性能监控对设备主控CPU性能的依赖,同时又能满足客户对毫秒级的网络性能监控的诉求。并且,能够减少毫秒级的网络性能监控对DCN带宽资源的依赖,满足现网大量部署毫秒级性能监控节点的要求。
下面,将从系统架构、方法、虚拟装置、实体装置、介质等多个角度,对本申请实施例提供的技术方案进行描述。
下面介绍本申请实施例提供的系统架构。
参见附图1,本申请实施例提供了一种系统架构100。系统架构100是对网络性能监控系统的举例说明。系统架构100包括至少一个网络设备和控制管理设备。至少一个网络设备中的每个网络设备通过网络(如DCN)与控制管理设备相连。
系统架构100中的网络设备包括而不限于接入网络设备、汇聚网络设备或核心网络设备。网络设备的类型包括多种情况。例如,网络设备包括而不限于是分组传送网(Packet Transport Network,PTN)设备、敏捷传送网(Agile Transport Network,ATN)、光交换网络(Optical Switch Network,OSN)、路由器或交换机;或者,网络设备是其他类型的支持性能监控的设备,例如是支持毫秒级性能监控的设备,本实施例对网络设备的类型不做限定。例如,请参见附图1,网络设备为接入网络设备101、汇聚网络设备102、汇聚网络设备103、骨干汇聚网络设备104、骨干汇聚网络设备105、核心网络设备106或核心网络设备107。其中,系统架构100中的全部或部分网络设备充当性能监控节点,用于执行下述方法200。可选地,系统架构100中的哪些网络设备充当性能监控节点由用户指定。例如,当需要将某台网络设备部署为性能监控节点时,用户使能该网络设备的性能监控功能,则网络设备会在使能指令的触发下,执行下述方法200。
系统架构100中的控制管理设备包括而不限于网管设备或控制器。网管设备例如是附图1中的OSS110。控制器例如是软件定义网络(英文:Software Defined Network,简称:SDN)中的SDN控制器(SDN controller)、网络功能虚拟化(英文:Network Function Virtualization,简称:NFV)中的网络功能虚拟化管理器(英文:Network Functions Virtualisation Manager,简称:VNFM)等。控制管理设备的物理实体例如是主机、服务器或个人计算机等,本实施例对控制管理设备的类型不做限定。
系统架构100中网络设备与控制管理设备之间的通信方式包括多种实现方式。例如,网络设备与控制管理设备之间通过Telemetry或SNMP进行通信。当然,Telemetry和SNMP是实现通信的可选方式,在另一些实施例中,网络设备与控制管理设备基于网络配置(Network Configuration,NETCONF)协议通信。
以上介绍了系统架构100,以下通过方法200,示例性介绍基于上文提供的系统架构监控网络性能的方法流程。
参见附图2,附图2是本申请实施例提供的一种网络性能监控方法200的流程图。示例性地,方法200包括S201至S205。
可选地,方法200由系统架构100中的网络设备执行,具体由同一个网络设备中的转发面和控制面执行,转发面用于承担S201对应的处理工作,控制面用于承担S202至S205对应的处理工作。
S201、转发面基于第一时间周期对网络性能数据进行采样,并记录网络性能异常的次数。
第一时间周期为转发面采集网络性能数据的采样周期。具体地,转发面每隔一个第一时间周期,对网络性能数据进行一次采样,以便根据采样得到的网络性能数据记录网络性能异常的次数。第一时间周期的粒度或者说时间长度包括多种情况。可选地,第一时间周期的数量级为毫秒级。例如,第一时间周期为1毫秒,转发面会每一毫秒采集一次网络性能数据。转发面通过基于毫秒级的采样周期对网络性能数据进行采样,有助于实现毫秒级的性能监控。 示例性地,网络性能数据用字母p表示,每当经过N个毫秒,转发面会采样得到N个网络性能数据,分别为p 1、p 2、p 3……p n,其中p i表示第i个毫秒采集的网络性能数据,i为大于或等于1且小于或等于n的正整数。可选地,网络性能数据为转发面内部采集的数据,不会被上报至OSS呈现给用户。
网络性能数据用于指示网络性能,例如指示转发面的转发性能。可选地,网络性能数据包括以下或至少一项:时延,丢包,抖动,带宽,传输速率,误码,错包。可选地,网络性能的监控对象包括物理端口、隧道、伪线或虚接口中的至少一项,相应地,网络性能数据包括物理端口的网络性能数据、隧道的网络性能数据、伪线的网络性能数据或虚接口的网络性能数据中的至少一项。可选地,不同监控对象的网络性能数据由转发面的不同组件负责采样。例如,物理端口的网络性能数据由物理接口卡进行采样,隧道和伪线的网络性能数据由NP进行采样。
转发面如何记录网络性能异常的次数包括多种实现方式。例如,转发面每次采样获取网络性能数据后,判断采样获取的网络性能数据是否满足预设条件,每次采样获取的网络性能数据满足预设条件时,转发面记录为一次网络性能异常。
预设条件的设置包括多种实现方式。可选地,预设条件包括:每次采样获取的网络性能数据的值大于或者等于第二阈值。其中,第二阈值可以称为性能越限门限或性能劣化门限。在预设条件通过阈值设置的情况下,网络性能异常的次数也称为越限次数。第二阈值可以根据需求或实际的网络情况设置。第二阈值包括而不限于时延阈值,丢包阈值,抖动阈值,带宽阈值,传输速率阈值,误码阈值,错包阈值中的至少一项。例如,网络性能数据为传输速率,第二阈值为70%。可选地,第二阈值由用户在使能网络性能监控功能时预先设定,或者,第二阈值为默认值。以第一时间周期为一个毫秒为例,转发面会将每毫秒采集的网络性能数据与第二阈值进行对比,如果每毫秒采集的网络性能数据大于或者等于第二阈值,则记录一次网络性能异常。
可选地,网络性能的监控可以具有多个异常等级,转发面会记录该多个异常等级中每个异常等级对应的网络性能异常的次数。具体地,多个异常等级分别对应多个预设条件,不同异常等级对应的预设条件可以是不同的,转发面每次采样获取网络性能数据后,会分别判断网络性能数据是否满足多个预设条件。对于多个异常等级中的一个异常等级而言,当每次采样获取的网络性能数据满足该异常等级对应的预设条件时,转发面会记录为一次该异常等级对应的网络性能异常。
如何为异常等级设置对应的预设条件包括多种实现方式。在一种可能的实现中,为多个异常等级分别设置多个第三阈值,第三阈值可以称为级别对应的性能越限门限或级别对应的性能劣化门限。异常等级对应的预设条件,包括:每次采样获取的网络性能数据的值大于或者等于异常等级对应的第三阈值。多个第三阈值和多个异常等级可以是一一对应的。第三阈值和上文涉及的第二阈值可以是相同的,也可以是不同的。每个异常等级对应的第三阈值由用户在使能网络性能监控功能时预先设定,或者,每个异常等级对应的第三阈值为默认值。第三阈值包括而不限于时延阈值,丢包阈值,抖动阈值,带宽阈值,传输速率阈值,误码阈值,错包阈值中的至少一项。
可选地,异常等级越高,异常等级对应的第三阈值越高。例如,网络性能数据为物理端口的传输速率,第三阈值为物理端口的传输速率阈值。在多个异常等级包括的2个异常等级 中,低异常等级对应的传输速率阈值为70%,高异常等级对应的传输速率阈值为85%。
示例性地,第三阈值用字母M表示,网络性能数据用字母p表示,网络性能异常的次数用字母mum表示。可以为异常等级1至异常等级n这N个异常等级,设置阈值M 1至阈值M n这N个第三阈值,转发面会根据N个第三阈值,记录mum 1至mum n这N个网络性能异常的次数。其中,M i表示异常等级i对应的第三阈值,mum i表示异常等级i对应的网络性能异常的次数,i为大于或等于1且小于或等于n的正整数。以第一时间周期为1毫秒为例,转发面在第k个毫秒采样获取网络性能数据p k后,会将网络性能数据p k与阈值M 1至阈值M n分别进行比较,如果阈值M i<网络性能数据p k<阈值M i+1,则转发面将次数mum i的值加一。换句话说,对于多个异常等级中相邻的2个异常等级而言,如果当前采样获取的网络性能数据的值大于前一个异常等级对应的阈值而小于后一个异常等级对应的阈值,则前一个异常等级对应的网络性能异常的次数会累计加一。
可选地,转发面不仅根据网络性能数据记录网络性能异常的次数,还会根据每次采样获取的网络性能数据,确定在第二时间周期内的网络性能参数。
网络性能参数包括而不限于第二时间周期内网络性能数据的最大值、第二时间周期内网络性能数据的最小值或第二时间周期内网络性能数据的平均值中的至少一项。结合网络性能数据的具体类型,可选地,网络性能参数包括最大时延、最小时延、平均时延、丢包率、抖动、带宽、传输速率、误码率或错包率中的至少一项。
如何计算第二时间周期内网络性能数据的最大值包括多种实现方式。例如,第二时间周期用字母T表示,周期T内网络性能数据的最大值用字母Max表示,以第一时间周期为1毫秒为例,转发面在周期T中的每一毫秒采样获取到网络性能数据后,将本次采集的网络性能数据的值和Max值进行对比,如果本次采集的网络性能数据的值大于已记录的Max值,则转发面将已记录的Max值更新为本次采集的网络性能数据的值,如果本次采集的网络性能数据的值小于或等于已记录的Max值,则转发面保持已记录的Max值不变,从而得到周期T内网络性能数据的最大值。
如何计算第二时间周期内网络性能数据的最小值包括多种实现方式。例如,第二时间周期用字母T表示,周期T内网络性能数据的最小值用字母Min表示,以第一时间周期为1毫秒为例,转发面在周期T中的每一毫秒采样获取到网络性能数据后,将本次采集的网络性能数据的值和Min值进行对比,如果本次采集的网络性能数据的值小于已记录的Min值,则转发面将已记录的Min值更新为本次采集的网络性能数据的值,如果本次采集的网络性能数据的值大于或等于已记录的Min值,则转发面保持已记录的Min值不变,从而得到周期T内网络性能数据的最大值。
如何计算第二时间周期内网络性能数据的平均值包括多种实现方式。例如,第二时间周期用字母T表示,周期T内网络性能数据的平均值用字母Avg表示,以第一时间周期为1毫秒为例,转发面对周期T中的每一毫秒采样获取到网络性能数据的值进行平均计算,得到周期T内网络性能数据的平均值Avg。
S202、控制面确定在第二时间周期内网络性能异常的次数大于第一阈值。
第二时间周期的时长大于第一时间周期的时长。第二时间周期的粒度或者说时间长度包括多种情况。可选地,在第一时间周期的数量级为毫秒级的情况下,第二时间周期的数量级至少为秒级,换句话说,第二时间周期的时长大于或等于一秒,例如,第二时间周期为1秒、 10秒、30秒、1分钟、5分钟、15分钟、30分钟或1小时。可选地,第二时间周期的时长由用户在使能网络性能监控功能时预先设定,或者,第二时间周期的时长为默认值,默认值例如是30秒。
第一阈值可以称为告警门限。可选地,第一阈值由用户在使能网络性能监控功能时预先设定,或者,第一阈值为默认值。每隔一个第二时间周期,控制面会对第二时间周期内网络性能异常的次数与第一阈值进行比较,如果转发面记录的网络性能异常的次数大于第一阈值,控制面会产生告警。示例性地,第二时间周期用字母T表示,网络性能异常的次数用字母mum表示,第一阈值用字母Alam-num表示,控制面会对mum与Alam-num比较,如果mum大于Alam-num,即第二时间周期内的网络性能异常的次数大于第一阈值的次数达到了告警门限,则控制面报告产生了告警。
通过第一时间周期和第二时间周期之间的数量关系可以看出,转发面进行数据采样和次数记录的周期是细粒度的,控制面进行上报的周期是粗粒度的,因而转发面和控制面通过实施本实施例以协同监控网络性能,一方面满足了细粒度地对网络性能监控的需求,另一方面减少了控制面海量上报数据对CPU性能的依赖以及对带宽资源的消耗。
本实施例中,控制面可以从转发面获取网络性能异常的次数。具体如何获取网络性能异常的次数包括多种实现方式,以下通过实现方式一至实现方式二举例说明。
实现方式一、控制面主动读取网络性能异常的次数。
可选地,转发面记录网络性能异常的次数后,将网络性能异常的次数保存至存储器中,控制面从存储器读取网络性能异常的次数。其中,用于保存网络性能异常的次数的存储器包括多种情况。例如,该存储器为转发面所在的接口板中的存储器,比如为物理接口卡上的寄存器。又如,该存储器为控制面所在的接口板(如主控板)中的存储器,本实施例对存储器的类型不做限定。
实现方式二、控制面接收转发面上报的网络性能异常的次数。
可选地,转发面记录网络性能异常的次数后,转发面向控制面发送网络性能异常的次数,控制面接收网络性能异常的次数。
S203、控制面生成告警。
控制面生成的告警包括多种情况,以下通过情况A至情况B举例说明。
情况A、控制面生成的告警包括告警指示信号,告警指示信号可以通过报警指示灯的亮灭或闪烁频率、报警音频等数据形式输出,比如,告警指示信号可以是警报。
情况B、控制面生成的告警包括告警信息,告警信息用于指示网络性能异常的次数大于第一阈值。告警信息的内容包括多种情况,例如,告警信息包括告警类型、告警源信息或时间戳中的至少一项,下面对这几种信息分别进行具体说明。
告警类型包括时延异常,丢包异常,抖动异常,带宽异常,传输速率异常,误码异常或错包异常中的至少一项。当告警信息的告警类型为时延异常时,告警信息指示时延异常的次数大于第一阈值。当告警信息的告警类型为传输速率异常时,告警信息指示传输速率异常的次数大于第一阈值。例如端口速率过慢的次数大于第一阈值,其他告警类型与此同理。通过在告警信息中携带告警类型,能够指明具体的网络性能异常事件,即具体哪一种网络性能数据出现了异常。
告警源信息用于指示生成告警信息的网络设备,告警源信息例如是控制面所在网络设备 的名称或控制面所在网络设备的互联网协议(internet protocol,IP)地址,通过在告警信息中携带告警源信息能够指明网络中的哪个网络设备监控到了网络性能异常。
时间戳用于指示网络性能异常的次数大于第一阈值的时间点,例如,控制面确定在第二时间周期内网络性能异常的次数大于第一阈值时,可以在告警信息中写入当前时间点的时间戳。通过在告警信息中携带时间戳,能够指明控制面何时监控到了网络性能异常。
可选地,在为网络性能监控设置有多个异常等级的情况下,告警信息还包括异常等级。不同异常等级的告警信息有所区别,以便指明出现了哪种异常等级的网络性能异常事件。例如,告警信息包括告警名称,不同异常等级的告警信息中的告警名称不同。又如,告警信息包括告警参数,不同异常等级的告警信息中的告警参数不同。
具体地,控制面会对第二时间周期内多个异常等级对应的网络性能异常的次数分别与第一阈值进行比较。对于多个异常等级中的一个异常等级而言,如果控制面确定在第二时间周期内该异常等级对应的网络性能异常的次数大于第一阈值,控制面生成用于指示该异常等级的告警信息。示例性地,第二时间周期用字母T表示,异常等级n对应的网络性能异常的次数用字母num n表示,第一阈值用字母Alam-num表示,控制面会对num n与Alam-num比较,如果控制面确定在T内num n大于Alam-num,控制面生成Alam n。其中,Alam n表示用于指示异常等级n的告警信息。
可选地,如果控制面确定在第二时间周期内多个异常等级对应的网络性能异常的次数均大于第一阈值,控制面生成用于指示多个异常等级中最高异常等级的告警信息。例如,第一阈值用字母Alam-num表示,如果异常等级1对应的网络性能异常的次数大于Alam-num,异常等级2对应的网络性能异常的次数也大于Alam-num,异常等级3对应的网络性能异常的次数也大于Alam-num,则控制面生成Alam 3。其中,Alam 3表示用于指示异常等级3的告警信息。
通过这种方式,如果同时存在多个异常等级满足了对应的阈值(第三阈值),则控制面能够只向上层OSS报告最高异常等级的告警信息,抑制低异常等级的告警信息上报。通过这种方式,能够减少上报的告警信息数量,避免告警信息过多对用户产生干扰。
S204、控制面向控制管理设备发送告警。
例如,控制面通过Telemetry或SNMP协议,将告警通过设备之间的DCN通道或带外DCN通道上报给控制管理设备。例如,请参考附图1所示的系统架构100,控制面所在的网络设备例如是接入网络设备101,接入网络设备101生成告警之后,向OSS110发送告警。控制管理设备接收到告警后,可以对告警进行分析呈现。控制面通过向控制管理设备上报告警,能够及时通知运营商网络中出现了性能异常事件,从而有助于运营商及时对网络情况进行调整,因而有助于网络性能异常的问题得到及时解决,从而避免影响用户体验。
可选地,控制面获取在第二时间周期内的网络性能参数之后,控制面还向控制管理设备发送网络性能参数,例如发送第二时间周期内网络性能数据的最大值、第二时间周期内网络性能数据的最小值和第二时间周期内网络性能数据的平均值。可选地,控制面通过Telemetry或SNMP协议,将网络性能参数通过设备之间的DCN通道或带外DCN通道上报给控制管理设备。控制管理设备接收到网络性能参数后,可以对网络性能参数进行分析呈现。
上报网络性能参数的时机或者说触发条件具体包括多种情况,以下通过情况I至情况II举例说明。
情况I、控制面基于第二时间周期上报网络性能参数。换句话说,每隔一个第二时间周期,控制面向控制管理设备发送一次网络性能参数。可选地,在情况I下,网络性能参数的上报不依赖于告警的上报,例如,当第二时间周期内所述网络性能异常的次数大于、小于或等于第一阈值这几种情况下,控制面均会上报第二时间周期内的网络性能参数。
情况II、控制面在上报告警时上报网络性能参数。换句话说,控制面确定在第二时间周期内所述网络性能参数大于第一阈值时,控制面向控制管理设备发送告警以及第二时间周期内的网络性能参数。
采用这种方式时,第二时间周期可以是控制面发送网络性能参数的上报周期。
应理解,S204为可选步骤,在另一些实施例中,控制面不执行S204。例如,控制面通过输出警报来提示用户监控到了网络性能异常的情况。
可选地,控制面还会上报网络性能异常的次数。具体地,控制面获取网络性能异常的次数后,还会向控制管理设备发送网络性能异常的次数,控制管理设备接收网络性能异常的次数,对网络性能异常的次数进行分析呈现。在网络性能监控具有多个异常等级的情况下,可选地,控制面向控制管理设备发送多个异常等级中每个异常等级对应的网络性能异常的次数。在网络性能数据包括时延,丢包,抖动,带宽,传输速率,误码,错包中的至少一项的情况下,可选地,控制面向控制管理设备发送时延异常的次数、丢包异常的次数、抖动异常的次数、带宽异常的次数、传输速率异常的次数、误码异常的次数、错包异常的次数中的至少一项。
上报网络性能异常的次数的时机或者说触发条件具体包括多种情况,以下通过情况a至情况b举例说明。
情况a、控制面基于第二时间周期上报网络性能异常的次数。换句话说,每隔一个第二时间周期,控制面向控制管理设备发送一次网络性能异常的次数。可选地,在情况a下,网络性能异常的次数的上报不依赖于告警的上报,例如,当第二时间周期内所述网络性能异常的次数大于、小于或等于第一阈值这几种情况下,控制面均会上报第二时间周期内网络性能异常的次数。
情况b、控制面在上报告警时上报网络性能异常的次数。换句话说,控制面确定在第二时间周期内所述网络性能异常的次数大于第一阈值时,控制面向控制管理设备发送告警以及第二时间周期内网络性能异常的次数。
S205、若第二时间周期之后的连续多个第二时间周期内网络性能异常的次数均小于第一阈值,控制面取消告警。
可选地,控制面支持告警取消的功能。具体地,在控制面确定在第二时间周期内网络性能异常的次数大于第一阈值并生成告警之后,转发面会继续基于第一时间周期对网络性能数据进行采样,并继续记录网络性能异常的次数,控制面会根据转发面记录的网络性能异常的次数,继续判断该第二时间周期之后的每个第二时间周期内网络性能异常的次数是否小于第一阈值。若出现网络性能异常的第二时间周期之后的连续多个第二时间周期内网络性能异常的次数均小于第一阈值,控制面取消告警。可选地,取消告警的方式是向控制管理设备发送指示告警消失的通知消息。例如,第二时间周期用字母T表示,控制面在周期T i上报告警Alam n之后,如果周期T i之后的连续N个周期T i+1至周期T i+N内记录的网络性能异常的次数都小于阈值Alam-num,则控制面报告告警Alam n消失。可选地,N由用户在使能网络性能监 控功能时预先设定,或者,N为默认值。例如,N为3。控制面通过在上报告警之后,根据连续多个周期内的网络性能异常次数取消告警,能够避免已产生的告警长期残留,同时通过取消告警,能够通知运营商网络性能已经正常,引起网络性能异常的故障已经得到恢复。应理解,S205为可选步骤,在另一些实施例中,控制面不执行S205。
本实施例提供的方法,通过转发面以细粒度的时间周期对网络性能数据进行采样,并记录网络性能异常的次数,通过控制面以粗粒度的时间周期,在转发面记录的网络性能异常的次数大于阈值时生成告警。在满足细粒度对网络性能监控的需求的基础上,由于控制面无需上报采集所有的网络性能数据,因而极大地减少了控制面所需上报的数据量,一方面解决了海量数据上报导致主控CPU过载的问题,减少网络性能监控对设备主控CPU性能的依赖。另一方面解决了海量数据上报导致占用大量带宽资源的问题,减少网络性能监控对带宽资源的依赖,有助于满足现网大量部署性能监控节点的需求。
以上介绍了本申请实施例的方法200,以下介绍本申请实施例的网络设备,应理解,以下介绍的网络设备具有上述方法200中转发面和控制面的任意功能。
附图3是本申请实施例提供的一种网络设备300的结构示意图,如附图3所示,网络设备300包括:采样模块301,用于执行S201中的采样步骤;记录模块302,用于执行S201中的记录步骤;确定模块303,用于执行S202;生成模块304,用于执行S203。
可选地,网络设备300还包括:发送模块,用于执行S204。
可选地,网络设备300还包括:取消模块,用于执行S205。
应理解,采样模块301和记录模块302对应于上述方法200中的转发面,采样模块301和记录模块302用于实现方法200中的转发面所实施的各种步骤和方法。换句话说,采样模块301和记录模块302与上述转发面属于同一构思,其具体实现过程详见方法200中转发面对应的流程,这里不再赘述。
应理解,确定模块303、生成模块304、发送模块和取消模块对应于上述方法200中的控制面,确定模块303、生成模块304、发送模块和取消模块用于实现方法200中的控制面所实施的各种步骤和方法。换句话说,确定模块303、生成模块304、发送模块和取消模块与上述控制面属于同一构思,其具体实现过程详见方法200中控制面对应的流程,这里不再赘述。
应理解,网络设备300中的每个功能模块采用软件实现。例如,采样模块301和记录模块302是转发面的处理器读取程序代码后生成的虚拟模块。确定模块303、生成模块304、发送模块和取消模块是控制面的处理器读取程序代码后生成的虚拟模块。
应理解,网络设备300在监控网络性能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将转发面或控制面的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
与本申请提供的方法实施例以及虚拟装置实施例相对应,本申请实施例还提供了一种网络设备400,下面对网络设备400的硬件结构进行介绍。
网络设备400对应于上述方法200中的转发面和控制面,网络设备400中的各硬件、模块和上述其他操作和/或功能分别为了实现方法200中的转发面和控制面所实施的各种步骤和方法,关于网络设备400如何对网络性能进行监控的详细流程,具体细节可参见上述方法200, 为了简洁,在此不再赘述。其中,方法200的各步骤通过网络设备400处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
网络设备400对应于上述虚拟装置实施例中的网络设备300,网络设备300中的每个功能模块采用网络设备400的软件实现。换句话说,网络设备300包括的功能模块为网络设备400的处理器读取存储器中存储的程序代码后生成的。
参见附图4,附图4示出了本申请一个示例性实施例提供的网络设备的结构示意图,网络设备400包括主控板410和接口板430。
主控板也称为主处理单元(main processing unit,MPU)或路由处理卡(route processor card),主控板410用于对网络设备400中各个组件的控制和管理,包括路由计算、设备管理、设备维护、协议处理功能。主控板410包括:中央处理器411和存储器412。
接口板430也称为线路接口单元卡(line processing unit,LPU)、线卡(line card)或业务板。接口板430用于提供各种业务接口并实现数据包的转发。业务接口包括而不限于以太网接口、POS(Packet over SONET/SDH)接口等,以太网接口例如是灵活以太网业务接口(Flexible Ethernet Clients,FlexE Clients)。接口板430包括:中央处理器431、网络处理器432、转发表项存储器434和物理接口卡(physical interface card,PIC)433。
接口板430上的中央处理器431用于对接口板430进行控制管理并与主控板410上的中央处理器411进行通信。
网络处理器432用于实现报文的转发处理。网络处理器432的形态可以是转发芯片。具体而言,网络处理器432用于基于转发表项存储器434保存的转发表转发接收到的报文,如果报文的目的地址为网络设备400的地址,则将该报文上送至CPU(如中央处理器411)处理;如果报文的目的地址不是网络设备400的地址,则根据该目的地址从转发表中查找到该目的地址对应的下一跳和出接口,将该报文转发到该目的地址对应的出接口。其中,上行报文的处理包括:报文入接口的处理,转发表查找;下行报文的处理:转发表查找等等。
物理接口卡433用于实现物理层的对接功能,原始的流量由此进入接口板430,以及处理后的报文从该物理接口卡433发出。物理接口卡433也称为子卡,可安装在接口板430上,负责将光电信号转换为报文并对报文进行合法性检查后转发给网络处理器432处理。在一些实施例中,中央处理器也可执行网络处理器432的功能,比如基于通用CPU实现软件转发,从而物理接口卡433中不需要网络处理器432。
可选地,网络设备400包括多个接口板,例如网络设备400还包括接口板440,接口板440包括:中央处理器441、网络处理器442、转发表项存储器444和物理接口卡443。
可选地,网络设备400还包括交换网板420。交换网板420也可以称为交换网板单元(switch fabric unit,SFU)。在网络设备有多个接口板430的情况下,交换网板420用于完成各接口板之间的数据交换。例如,接口板430和接口板440之间可以通过交换网板420通信。
主控板410和接口板430耦合。例如。主控板410、接口板430和接口板440,以及交换网板420之间通过系统总线与系统背板相连实现互通。在一种可能的实现方式中,主控板410 和接口板430之间建立进程间通信协议(inter-process communication,IPC)通道,主控板410和接口板430之间通过IPC通道进行通信。
在逻辑上,网络设备400包括控制面和转发面,控制面包括主控板410和中央处理器431,转发面包括执行转发的各个组件,比如转发表项存储器434、物理接口卡433和网络处理器432。控制面执行路由器、生成转发表、处理信令和协议报文、配置与维护设备的状态等功能,控制面将生成的转发表下发给转发面,在转发面,网络处理器432基于控制面下发的转发表对物理接口卡433收到的报文查表转发。控制面下发的转发表可以保存在转发表项存储器434中。
在实施方法200的过程中,接口板430或接口板440用于执行转发面对应的步骤。以网络性能的监控对象为物理端口为例,物理接口卡433基于第一时间周期,对物理接口卡433中物理端口的网络性能数据进行采样,并记录网络性能异常的次数,将网络性能异常的次数保存至转发表项存储器434。以网络性能的监控对象为隧道为例,网络处理器432基于第一时间周期,对隧道的网络性能数据进行采样,并记录网络性能异常的次数,将网络性能异常的次数保存至转发表项存储器434。
在实施方法200的过程中,主控板410用于执行控制面对应的步骤。例如,中央处理器431从转发表项存储器434读取网络性能异常的次数,确定在第二时间周期内所述网络性能异常的次数大于第一阈值,中央处理器431生成告警。
应理解,网络设备300中的采样模块301,记录模块302相当于网络设备400中的接口板430或接口板440;网络设备300中的确定模块303、生成模块304、发送模块和取消模块可以相当于主控板410。
应理解,本申请实施例中接口板440上的操作与接口板430的操作一致,为了简洁,不再赘述。
值得说明的是,主控板可能有一块或多块,有多块的时候可以包括主用主控板和备用主控板。接口板可能有一块或多块,网络设备的数据处理能力越强,提供的接口板越多。接口板上的物理接口卡也可以有一块或多块。交换网板可能没有,也可能有一块或多块,有多块的时候可以共同实现负荷分担冗余备份。在集中式转发架构下,网络设备可以不需要交换网板,接口板承担整个系统的业务数据的处理功能。在分布式转发架构下,网络设备可以有至少一块交换网板,通过交换网板实现多块接口板之间的数据交换,提供大容量的数据交换和处理能力。所以,分布式架构的网络设备的数据接入和处理能力要大于集中式架构的设备。可选地,网络设备的形态也可以是只有一块板卡,即没有交换网板,接口板和主控板的功能集成在该一块板卡上,此时接口板上的中央处理器和主控板上的中央处理器在该一块板卡上可以合并为一个中央处理器,执行两者叠加后的功能,这种形态设备的数据交换和处理能力较低(例如,低端交换机或路由器等网络设备)。具体采用哪种架构,取决于具体的组网部署场景,此处不做任何限定。
在一些可能的实施例中,上述转发面和控制面还可以使用计算机程序产品实现。具体地,本申请实施例提供了一种计算机程序产品,当该计算机程序产品在网络设备上运行时,使得网络设备的转发面和控制面分别执行上述方法200中的网络性能监控方法。
应理解,上述各种产品形态的转发面和控制面,分别具有上述方法200中转发面和控制 面的任意功能,此处不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例中描述的各方法步骤和单元,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参见前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例中方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上描述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机程序指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例中的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机程序指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服 务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD)、或者半导体介质(例如固态硬盘)等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种所述示例的范围的情况下,第一阈值可以被称为第二阈值,并且类似地,第二阈值可以被称为第一阈值。第一阈值和第二阈值都可以是阈值,并且在某些情况下,可以是单独且不同的阈值。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个第二报文是指两个或两个以上的第二报文。本文中术语“系统”和“网络”经常可互换使用。
还应理解,术语“如果”可被解释为意指“当...时”(“when”或“upon”)或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定...”或“如果检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
以上描述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (24)

  1. 一种网络性能监控方法,其特征在于,所述方法包括:
    转发面基于第一时间周期对网络性能数据进行采样,并记录网络性能异常的次数,其中,每次采样获取的网络性能数据满足预设条件时,记录为一次网络性能异常,所述第一时间周期为所述转发面采集网络性能数据的采样周期;
    控制面确定在第二时间周期内所述网络性能异常的次数大于第一阈值;所述第二时间周期的时长大于所述第一时间周期的时长;
    所述控制面生成告警。
  2. 根据权利要求1所述的方法,其特征在于,所述第一时间周期的数量级为毫秒级,所述第二时间周期的数量级至少为秒级。
  3. 根据权利要求1或2所述的方法,其特征在于,所述预设条件包括:
    每次采样获取的网络性能数据的值大于或者等于第二阈值。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,
    所述记录网络性能异常的次数,包括:记录多个异常等级中每个异常等级对应的网络性能异常的次数,所述多个异常等级分别对应多个预设条件,其中,每次采样获取的网络性能数据满足异常等级对应的预设条件时,记录为一次异常等级对应的网络性能异常。
  5. 根据权利要求4所述的方法,其特征在于,所述控制面确定在第二时间周期内所述网络性能异常的次数大于第一阈值,包括:
    所述控制面确定在第二时间周期内所述多个异常等级对应的网络性能异常的次数均大于所述第一阈值;
    所述控制面生成告警,包括:
    所述控制面生成用于指示所述多个异常等级中最高异常等级的告警信息。
  6. 根据权利要求4或5所述的方法,其特征在于,所述异常等级对应的预设条件,包括:每次采样获取的网络性能数据的值大于或者等于所述异常等级对应的第三阈值,所述异常等级越高,所述异常等级对应的第三阈值越高。
  7. 根据权利要求1所述的方法,其特征在于,所述网络性能数据包括以下或至少一项:时延,丢包,抖动,带宽,传输速率,误码,错包。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    根据每次采样获取的网络性能数据,所述转发面确定在所述第二时间周期内的网络性能参数;
    所述控制面获取所述网络性能参数。
  9. 根据权利要求8所述的方法,其特征在于,所述网络性能参数包括以下至少一项:
    最大时延;
    最小时延;
    平均时延;
    丢包率;
    抖动;
    带宽;
    传输速率;
    误码率;以及
    错包率。
  10. 根据权利要求1所述的方法,其特征在于,所述控制面生成告警之后,所述方法还包括:
    所述控制面向控制管理设备发送所述告警。
  11. 根据权利要求10所述的方法,其特征在于,所述控制面向控制管理设备发送所述告警之后,所述方法还包括:
    若所述第二时间周期之后的连续多个第二时间周期内所述网络性能异常的次数均小于所述第一阈值,所述控制面取消所述告警。
  12. 一种网络设备,其特征在于,所述网络设备包括:
    采样模块,用于基于第一时间周期对网络性能数据进行采样,所述第一时间周期为采集网络性能数据的采样周期;
    记录模块,用于记录网络性能异常的次数,其中,每次采样获取的网络性能数据满足预设条件时,记录为一次网络性能异常;
    确定模块,用于确定在第二时间周期内所述网络性能异常的次数大于第一阈值;所述第二时间周期的时长大于所述第一时间周期的时长;
    生成模块,用于生成告警。
  13. 根据权利要求12所述的设备,其特征在于,所述第一时间周期的数量级为毫秒级,所述第二时间周期的数量级至少为秒级。
  14. 根据权利要求12或13所述的设备,其特征在于,所述预设条件包括:
    每次采样获取的网络性能数据的值大于或者等于第二阈值。
  15. 根据权利要求12-14任一项所述的设备,其特征在于,所述记录模块,用于记录多个异常等级中每个异常等级对应的网络性能异常的次数,所述多个异常等级分别对应多个预设条件,其中,每次采样获取的网络性能数据满足异常等级对应的预设条件时,记录为一次异常等级对应的网络性能异常。
  16. 根据权利要求15所述的设备,其特征在于,所述确定模块,用于确定在第二时间周期内所述多个异常等级对应的网络性能异常的次数均大于所述第一阈值;
    所述生成模块,用于生成用于指示所述多个异常等级中最高异常等级的告警信息。
  17. 根据权利要求15或16所述的设备,其特征在于,所述异常等级对应的预设条件,包括:每次采样获取的网络性能数据的值大于或者等于所述异常等级对应的第三阈值,所述异常等级越高,所述异常等级对应的第三阈值越高。
  18. 根据权利要求12所述的设备,其特征在于,所述网络性能数据包括以下或至少一项:时延,丢包,抖动,带宽,传输速率,误码,错包。
  19. 根据权利要求18所述的设备,其特征在于,所述确定模块,还用于根据每次采样获取的网络性能数据,确定在所述第二时间周期内的网络性能参数。
  20. 根据权利要求19所述的设备,其特征在于,所述网络性能参数包括以下至少一项:
    最大时延;
    最小时延;
    平均时延;
    丢包率;
    抖动;
    带宽;
    传输速率;
    误码率;以及
    错包率。
  21. 根据权利要求12所述的设备,其特征在于,所述设备还包括:
    发送模块,用于向控制管理设备发送所述告警。
  22. 根据权利要求21所述的设备,其特征在于,所述设备还包括:
    取消模块,用于若所述第二时间周期之后的连续多个第二时间周期内所述网络性能异常的次数均小于所述第一阈值,取消所述告警。
  23. 一种网络设备,其特征在于,所述网络设备包括主控板和接口板,所述主控板包括第一处理器,所述第一处理器用于执行指令以执行如权利要求1至权利要求11中控制面对应的步骤,所述接口板包括第二处理器,所述第二处理器用于执行指令以执行如权利要求1至权利要求11中转发面对应的步骤。
  24. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器读取以使转发面和控制面执行如权利要求1至权利要求11中任一项所述的方法。
PCT/CN2021/085816 2020-04-29 2021-04-07 网络性能监控方法、网络设备及存储介质 WO2021218582A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022566251A JP2023523472A (ja) 2020-04-29 2021-04-07 ネットワーク性能監視方法、ネットワークデバイス、および記憶媒体
EP21796281.0A EP4131856A4 (en) 2020-04-29 2021-04-07 METHOD FOR MONITORING NETWORK PERFORMANCE, NETWORK DEVICE AND STORAGE MEDIUM
US17/976,491 US20230041307A1 (en) 2020-04-29 2022-10-28 Network Performance Monitoring Method, Network Device, and Storage Medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010359259.0A CN113572654B (zh) 2020-04-29 2020-04-29 网络性能监控方法、网络设备及存储介质
CN202010359259.0 2020-04-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/976,491 Continuation US20230041307A1 (en) 2020-04-29 2022-10-28 Network Performance Monitoring Method, Network Device, and Storage Medium

Publications (1)

Publication Number Publication Date
WO2021218582A1 true WO2021218582A1 (zh) 2021-11-04

Family

ID=78158889

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/085816 WO2021218582A1 (zh) 2020-04-29 2021-04-07 网络性能监控方法、网络设备及存储介质

Country Status (5)

Country Link
US (1) US20230041307A1 (zh)
EP (1) EP4131856A4 (zh)
JP (1) JP2023523472A (zh)
CN (1) CN113572654B (zh)
WO (1) WO2021218582A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174358A (zh) * 2022-09-08 2022-10-11 浪潮电子信息产业股份有限公司 存储集群接口的监测处理方法、系统、设备及存储介质
CN116390155A (zh) * 2023-06-02 2023-07-04 新华三技术有限公司 一种报文收发控制方法、装置、电子设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501551B (zh) * 2023-06-21 2023-09-15 山东远桥信息科技有限公司 一种数据告警产生及恢复处理方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778111A (zh) * 2014-01-14 2015-07-15 深圳市腾讯计算机系统有限公司 一种进行报警的方法和装置
CN106034056A (zh) * 2015-03-18 2016-10-19 北京启明星辰信息安全技术有限公司 一种业务安全分析的方法和系统
US20190222594A1 (en) * 2018-01-15 2019-07-18 International Business Machines Corporation Network flow control of Internet Of Things (IoT) devices

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7870243B1 (en) * 2000-04-11 2011-01-11 International Business Machines Corporation Method, system and program product for managing network performance
US8477648B2 (en) * 2010-02-16 2013-07-02 Vss Monitoring, Inc. Systems, apparatus, and methods for monitoring network capacity
US10558544B2 (en) * 2011-02-14 2020-02-11 International Business Machines Corporation Multiple modeling paradigm for predictive analytics
US9692775B2 (en) * 2013-04-29 2017-06-27 Telefonaktiebolaget Lm Ericsson (Publ) Method and system to dynamically detect traffic anomalies in a network
CN103259682A (zh) * 2013-05-16 2013-08-21 浪潮通信信息系统有限公司 一种基于多维数据聚合的通信网网元安全评估方法
US10516594B2 (en) * 2014-12-21 2019-12-24 Pismo Labs Technology Limited Systems and methods for changing the frequency of monitoring data
US11327737B2 (en) * 2017-04-21 2022-05-10 Johnson Controls Tyco IP Holdings LLP Building management system with cloud management of gateway configurations
CN109392002B (zh) * 2017-08-11 2022-07-12 华为技术有限公司 一种上报网络性能参数的方法及设备
JP2019179395A (ja) * 2018-03-30 2019-10-17 オムロン株式会社 異常検知システム、サポート装置および異常検知方法
CN108768776A (zh) * 2018-05-30 2018-11-06 郑州云海信息技术有限公司 一种基于OpenFlow的网络监控方法及装置
CN109995599A (zh) * 2019-04-28 2019-07-09 武汉烽火技术服务有限公司 一种网络性能异常的智能告警方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778111A (zh) * 2014-01-14 2015-07-15 深圳市腾讯计算机系统有限公司 一种进行报警的方法和装置
CN106034056A (zh) * 2015-03-18 2016-10-19 北京启明星辰信息安全技术有限公司 一种业务安全分析的方法和系统
US20190222594A1 (en) * 2018-01-15 2019-07-18 International Business Machines Corporation Network flow control of Internet Of Things (IoT) devices

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174358A (zh) * 2022-09-08 2022-10-11 浪潮电子信息产业股份有限公司 存储集群接口的监测处理方法、系统、设备及存储介质
CN115174358B (zh) * 2022-09-08 2023-01-17 浪潮电子信息产业股份有限公司 存储集群接口的监测处理方法、系统、设备及存储介质
CN116390155A (zh) * 2023-06-02 2023-07-04 新华三技术有限公司 一种报文收发控制方法、装置、电子设备及存储介质
CN116390155B (zh) * 2023-06-02 2023-08-25 新华三技术有限公司 一种报文收发控制方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP4131856A4 (en) 2023-09-20
US20230041307A1 (en) 2023-02-09
CN113572654A (zh) 2021-10-29
JP2023523472A (ja) 2023-06-05
CN113572654B (zh) 2023-11-14
EP4131856A1 (en) 2023-02-08

Similar Documents

Publication Publication Date Title
WO2021218582A1 (zh) 网络性能监控方法、网络设备及存储介质
JP5643433B2 (ja) プロトコルイベント管理のための方法および装置
US8989002B2 (en) System and method for controlling threshold testing within a network
US9763122B2 (en) System and method for tracking a line rate utilization
US9270560B2 (en) Session layer for monitoring utility application traffic
US9451500B2 (en) Conditional routing technique
EP2681871B1 (en) In-service throughput testing in distributed router/switch architectures
US8619589B2 (en) System and method for removing test packets
US20060168263A1 (en) Monitoring telecommunication network elements
US8634324B2 (en) Method and apparatus for providing signature based predictive maintenance in communication networks
CN106302001B (zh) 数据通信网络中业务故障检测方法、相关装置及系统
US11038788B2 (en) System and method for continuous in-line monitoring of data-center traffic
US8509093B2 (en) Outage analysis system
US7958386B2 (en) Method and apparatus for providing a reliable fault management for a network
JP2008131137A (ja) 音声信号中継装置
US8571182B2 (en) Systems and methods of masking non-service affecting alarms in a communication system
JP3794348B2 (ja) 局側伝送装置および通信システム監視方法
Yang et al. Performance monitoring with predictive QoS analysis of LTE backhaul
Kuwabara et al. Adaptive network monitoring system for large-volume streaming services in multi-domain networks
CN112751740B (zh) 一种erps子环资源释放方法、系统、服务器及存储介质
Lipovac Expert system based network testing
JP2998249B2 (ja) ネットワーク監視方式
Yang et al. A QoS Approach for Detecting and Managing a Fault Alarm Storm
Huang et al. Network Service Availability Detecting and Reporting System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21796281

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022566251

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021796281

Country of ref document: EP

Effective date: 20221103

NENP Non-entry into the national phase

Ref country code: DE