CN115842717A - Service fault positioning method and related equipment - Google Patents

Service fault positioning method and related equipment Download PDF

Info

Publication number
CN115842717A
CN115842717A CN202111097969.1A CN202111097969A CN115842717A CN 115842717 A CN115842717 A CN 115842717A CN 202111097969 A CN202111097969 A CN 202111097969A CN 115842717 A CN115842717 A CN 115842717A
Authority
CN
China
Prior art keywords
hop
data
port
service
link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111097969.1A
Other languages
Chinese (zh)
Inventor
吴艳芹
吕田田
张乐
赵旭楠
云亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111097969.1A priority Critical patent/CN115842717A/en
Publication of CN115842717A publication Critical patent/CN115842717A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the disclosure provides a service fault positioning method and related equipment. The method comprises the following steps: acquiring end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period; performing end-to-end service quality classification evaluation processing on end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period to obtain an end-to-end service quality result; if the end-to-end service quality result is a first end-to-end service state, acquiring in-band telemetering performance index data and port data of each hop-by-hop link of a target service in a current acquisition period; carrying out hop-by-hop link quality classification evaluation processing on in-band telemetry performance index data and port data of each hop-by-hop link of a target service in a current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link; and positioning the fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link.

Description

Service fault positioning method and related equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a service fault location method, a service fault location apparatus, a computer device, a computer-readable storage medium, and a computer program product.
Background
In the field of internet and telecommunication, the existing network operation and maintenance service fault location is an active measurement mode which is mainly used for measuring performance indexes such as time delay, packet loss and the like of detection messages by constructing a detection message mode, indirectly obtaining network quality and analyzing the network quality. However, in the related art, the problem point cannot be truly reproduced by adopting an out-of-band measurement method for positioning the service quality or the fault, so that the measurement cost is increased, and the positioning efficiency is low.
In recent years, a passive measurement mode of a detection technology for directly measuring actual service flows becomes a hot spot, the in-band flow following detection technology is realized on an equipment level, but no mature technology exists for end-to-end service fault location based on a flow following detection result.
Disclosure of Invention
The embodiment of the disclosure provides a service fault positioning method, a service fault positioning device, a computer readable storage medium and a computer program product, which can realize more accurate service fault positioning.
The embodiment of the disclosure provides a method for positioning a service fault, which comprises the following steps: acquiring end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period; performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result; if the end-to-end service quality result is a first end-to-end service state, acquiring in-band telemetering performance index data and port data of each hop-by-hop link of the target service in the current acquisition period; carrying out hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link; and positioning the fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link.
The embodiment of the present disclosure provides a service fault positioning apparatus, which includes: an end-to-end in-band telemetering performance index data acquisition unit, configured to acquire end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period; an end-to-end service quality result obtaining unit, configured to perform end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result; a hop-by-hop in-band telemetry performance index data acquisition unit, configured to acquire in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period if the end-to-end service quality result is a first end-to-end service state; a hop-by-hop link service quality result obtaining unit, configured to perform hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period, to obtain a hop-by-hop link service quality result of each hop-by-hop link; and the fault link positioning unit is used for positioning the fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link.
The disclosed embodiment provides a computer device, including: at least one processor; storage means for storing at least one program; when the at least one program is executed by the at least one processor, the method in any one of the possible implementations of the above embodiments is implemented.
The embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program for a computer device to execute is stored, and the program, when executed by a processor, implements the method in any one of the possible implementation manners in the above embodiment.
Embodiments of the present disclosure provide a computer program product containing instructions. Instructions for causing a computer device to perform the method of any one of the above-mentioned parties or any one of the above-mentioned possible implementations when the computer program product is run on the computer device.
In the technical solutions provided by some embodiments of the present disclosure, on one hand, an end-to-end service quality result is obtained by obtaining end-to-end in-band telemetry performance index data and port data of a target service in a current acquisition period and then performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period, so that a problem of end-to-end fault location in a network operation and maintenance management plane is solved; on the other hand, when the end-to-end service quality result is a first end-to-end service state, in-band telemetering performance index data and port data of each hop-by-hop link of the target service in the current acquisition period are acquired; and performing hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link, and positioning a fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link, namely combining end-to-end service quality classification evaluation processing and hop-by-hop link quality classification evaluation processing, so that accurate positioning of service faults is realized, and efficient network operation and maintenance are realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.
Fig. 1 schematically shows a flow chart of a service fault location method according to an embodiment of the present disclosure.
Fig. 2 schematically illustrates an application scenario diagram of a service fault location method according to an embodiment of the present disclosure.
Fig. 3 schematically shows a flow chart of a service fault location method according to another embodiment of the present disclosure.
Fig. 4 schematically shows a schematic block diagram of a service fault location device according to an embodiment of the present disclosure.
Fig. 5 schematically shows a schematic block diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
In the description of the present disclosure, "/" denotes "or" means, for example, a/B may denote a or B, unless otherwise specified. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" means one or more, "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.
The embodiment of the present disclosure does not particularly limit a specific structure of an execution subject of the method provided by the embodiment of the present disclosure, as long as processing can be performed according to the method provided by the embodiment of the present disclosure by running a program recorded with a code of the method provided by the embodiment of the present disclosure, for example, the execution subject of the method provided by the embodiment of the present disclosure may be a computer device, or a functional module capable of calling a program and executing the program in the computer device.
It is to be understood that the network architecture and the service scenario described in the embodiment of the present disclosure are for more clearly illustrating the technical solution of the embodiment of the present disclosure, and do not constitute a limitation to the technical solution provided in the embodiment of the present disclosure, and as the network architecture evolves and a new service scenario appears, a person having ordinary skill in the art may know that the technical solution provided in the embodiment of the present disclosure is also applicable to similar technical problems.
Fig. 1 schematically shows a flow chart of a service fault location method according to an embodiment of the present disclosure.
As shown in fig. 1, the method provided by the embodiment of the present disclosure may include the following steps.
In step S110, end-to-end in-band telemetry performance index data and port data of the target service in the current acquisition period are obtained.
In an exemplary embodiment, acquiring end-to-end in-band telemetry performance index data and port data of a target service in a current acquisition period may include: setting in-band telemetry instance configuration information, wherein the in-band telemetry instance configuration information comprises the target service, an acquisition period and a detection mode; if the detection mode is an end-to-end mode, issuing an end-to-end in-band telemetry instance to the first node equipment through which the target service passes so that the first node equipment activates the end-to-end in-band telemetry instance and collects in-band telemetry data of the first node equipment and the target node equipment; receiving in-band telemetry data of the head node device and the destination node device; and acquiring end-to-end in-band telemetry performance index data of the target service in the current acquisition period according to the in-band telemetry data of the first node device and the target node device and the end-to-end port data of the target service in the current acquisition period.
In an exemplary embodiment, the end-to-end port data may include a head port rate of the head node apparatus and a destination port rate of the destination node apparatus.
In an exemplary embodiment, the in-band telemetry data of the head node apparatus and the destination node apparatus may include destination port traffic of the destination node apparatus, head port traffic of the head node apparatus, destination port receive time of the destination node apparatus, head port transmit time of the head node apparatus, number of packets transmitted by the head node apparatus, and number of packets received by the destination node apparatus.
In an exemplary embodiment, the end-to-end in-band telemetry performance indicator data may include a head port bandwidth utilization rate, a destination port bandwidth utilization rate, an end-to-end one-way delay, and an end-to-end packet loss rate of the head node device.
Obtaining end-to-end in-band telemetry performance index data of the target service in the current acquisition period according to the in-band telemetry data of the first node device and the target node device and the end-to-end port data of the target service in the current acquisition period may include: obtaining the bandwidth utilization rate of the head port according to the head port flow and the head port rate; obtaining the bandwidth utilization rate of the destination port according to the destination port flow and the destination port rate; obtaining the end-to-end one-way time delay according to the receiving time of the destination port and the sending time of the head port; and obtaining the end-to-end packet loss rate according to the packet sending number of the first node equipment and the packet receiving number of the target node equipment.
In an exemplary embodiment, obtaining end-to-end in-band telemetry performance index data of the target service in the current acquisition period according to in-band telemetry data of the first node device and the destination node device and end-to-end port data of the target service in the current acquisition period may further include: if bidirectional service flow data exists between the first node device and the destination node device, acquiring the receiving time of a first port of the first node device and the sending time of a destination port of the destination node device; obtaining an end-to-end bidirectional time delay according to the receiving time of the destination port, the sending time of the head port, the receiving time of the head port and the sending time of the destination port; the end-to-end in-band telemetry performance indicator data further comprises the end-to-end bi-directional delay.
In step S120, performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result.
In an exemplary embodiment, the in-band telemetry data of the head node device and the destination node device may further include a transmission timestamp of the head node device and a reception timestamp of the destination node device.
Performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result, where the method includes: performing time characteristic processing on the receiving time stamp and the sending time stamp to obtain time characteristic data; and calling an end-to-end service quality classification evaluation model to process the packet sending number of the first node equipment, the packet receiving number of the target node equipment, the time characteristic data, the head port bandwidth utilization rate, the target port bandwidth utilization rate, the end-to-end one-way time delay, the end-to-end packet loss rate and the end-to-end port data to obtain an end-to-end service quality result.
In an exemplary embodiment, the method may further include: acquiring end-to-end historical in-band telemetry performance index data, historical port data, historical time characteristic data and an end-to-end service quality label of the target service in a historical acquisition period; and training a first machine learning model by using end-to-end historical in-band telemetering performance index data, historical port data, historical time characteristic data and an end-to-end service quality label of the target service in a historical acquisition period to obtain the end-to-end service quality classification evaluation model.
In an exemplary embodiment, the in-band telemetry data of the head node device and the destination node device may further include a transmission timestamp of the head node device and a reception timestamp of the destination node device.
Performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result, where the method includes: performing time characteristic processing on the receiving time stamp and the sending time stamp to obtain time characteristic data; and calling an end-to-end service quality classification evaluation model to process the packet sending number of the first node equipment, the packet receiving number of the target node equipment, the time characteristic data, the head port bandwidth utilization rate, the target port bandwidth utilization rate, the end-to-end one-way time delay, the end-to-end two-way time delay, the end-to-end packet loss rate and the end-to-end port data to obtain an end-to-end service quality result.
In step S130, if the end-to-end service quality result is the first end-to-end service state, in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period are obtained.
In an exemplary embodiment, if the end-to-end service quality result is a first end-to-end service state, acquiring in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period may include: if the end-to-end service quality result is the first end-to-end service state, adjusting a detection mode in the in-band telemetry instance configuration information from the end-to-end mode to a hop-by-hop mode, performing hop-by-hop detection on each node device passed by the target service, and acquiring in-band telemetry data and port data of each hop-by-hop link; and acquiring in-band telemetry performance index data of each hop-by-hop link according to the in-band telemetry data of each hop-by-hop link.
In step S140, performing hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period, to obtain a hop-by-hop link service quality result of each hop-by-hop link.
In an exemplary embodiment, the performing a hop-by-hop link quality classification evaluation process on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link may include: and calling a hop-by-hop link quality classification evaluation model to process the in-band telemetry performance index data and the port data of each hop-by-hop link according to the in-band telemetry performance index data and the port data of each hop-by-hop link, and obtaining a hop-by-hop link service quality result of each hop-by-hop link.
In step S110, according to the result of the quality of service of the hop-by-hop link of each hop-by-hop link, a faulty link of the target service in the current acquisition period is located.
In an exemplary embodiment, the locating the faulty link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link may include: taking the hop-by-hop link with the service quality result of the hop-by-hop link in the first hop-by-hop service state as the fault link; acquiring a link identifier of the fault link and a source node device identifier and a destination node device identifier corresponding to the fault link; and obtaining the network address of the source node device corresponding to the fault link according to the source node device identifier of the fault link, obtaining the network address of the sink node device corresponding to the fault link according to the sink node device identifier of the fault link, and determining the autonomous system domain where the fault link is located according to the link identifier of the fault link.
On one hand, the service fault positioning method provided by the embodiment of the disclosure obtains an end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period, and then performs end-to-end service quality classification evaluation processing on the end-to-end in-band telemetering performance index data and port data of the target service in the current acquisition period to obtain an end-to-end service quality result, thereby solving the problem of end-to-end fault positioning in a network operation and maintenance management layer; on the other hand, when the end-to-end service quality result is a first end-to-end service state, in-band telemetering performance index data and port data of each hop-by-hop link of the target service in the current acquisition period are acquired; and performing hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link, and positioning a fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link, namely combining end-to-end service quality classification evaluation processing and hop-by-hop link quality classification evaluation processing, so that accurate positioning of service faults is realized, and efficient network operation and maintenance are realized.
The method provided by the embodiment of the disclosure is a service fault positioning scheme based on in-band telemetry, and can be applied to the field of IP (Internet Protocol) networks and the technical field of big data.
The solutions in the related art have at least the following problems:
1) The in-band telemetry technology in the related technology mainly solves the problems of data message construction and transmission on the equipment level, and does not solve the problem of service fault location on the network operation and maintenance management level.
2) In the in-band telemetry technology in the related art, the data packet sampling rate of the device is set by the device or an external management system, so that the efficiency of inserting and processing the in-band telemetry OAM (operation, administration and maintenance) information is improved, and the problem of end-to-end fault location in the network operation and maintenance management plane is not solved.
3) The related technology mainly solves the problem of how to realize telemetry in the band, and does not provide the result analysis and positioning technology of in-band telemetry.
In summary, the service quality evaluation/fault location in the related art is mainly based on out-of-band measurement, that is, a detection message for measurement is constructed outside the service flow for measurement, so that the problem point cannot be truly reproduced, the measurement cost is increased, and the fault location efficiency is low; although in-band telemetry has emerged, no implementation is disclosed for its application to service fault location.
The above method provided by the embodiment of the present disclosure is illustrated with reference to fig. 2 and 3, but the present disclosure is not limited thereto.
According to the scheme for locating the service fault/problem point based on the in-band telemetry, according to performance data reported by the in-band telemetry (including the in-band telemetry data of the first node device reported by the first node device, the in-band telemetry data of the target node device reported by the target node device, and the in-band telemetry data of each hop-by-hop link), end-to-end and hop-by-hop performance data of a service flow of the target service are obtained, wherein the end-to-end performance data includes end-to-end in-band telemetry performance index data, and the hop-by-hop performance data includes the in-band telemetry performance index data of each hop-by-hop link, wherein the end-to-end and hop performance data may include any one or more of time delay, packet loss rate, bandwidth utilization rate, flow rate, bandwidth and the like, the end-to-end time delay is referred to as end-to-end time delay, and the end-to-end time delay may include end-to-end one-way time delay (also referred to as end-to-end service one-way time delay), and in some embodiments, the end time delay may also be referred to-end two way service; the end-to-end packet loss rate is called end-to-end packet loss rate, and the end-to-end bandwidth utilization rate may include a head port bandwidth utilization rate of a head node device and a destination port bandwidth utilization rate of a destination node device; the end-to-end traffic may include destination port traffic of the destination node device and head port traffic of the head node device; the hop-by-hop delay may include a one-way delay of each hop-by-hop link, and in some embodiments, the hop-by-hop delay may further include a two-way delay of each hop-by-hop link, and the service quality difference problem is located with the flow by using an end-to-end service quality classification evaluation model and a hop-by-hop link quality classification evaluation model, as characterized by an IP quadruplet (source IP address, destination IP address, source port, destination port) of the service flow, the rate of each physical port, end-to-end and hop-by-hop performance data, and time characteristic data.
Fig. 2 schematically illustrates an application scenario diagram of a service fault location method according to an embodiment of the present disclosure.
The system architecture for implementing the method provided by the embodiment of the present disclosure is shown in fig. 2, and fig. 2 is an architecture diagram of an in-band telemetry-based service problem/fault location system 200, and the system 200 may include an in-band telemetry instance setting module 210, a network element configuration issuing module 220, a data acquisition module 230, a data preprocessing module 240, a data storage module 250, an end-to-end service quality evaluation module 260, a fault point location module 270, a fault risk calculation module 280, a fault location definition segment module 290, and a result sending module 2100.
The client device 300 in fig. 2 may be a client-side edge network device.
In fig. 2, each node device through which the target service passes is taken as an example to illustrate each router, but the present disclosure is not limited thereto, and each node device through which the target service passes is taken as an example and is respectively labeled as router 1, router 2, router 3, router 4, router 5, router 6, and router 7, and each router in the embodiment of fig. 2 may be an IP network router device supporting in-band telemetry, for example, an SRv6 (using SR (Segment Routing) IPv6 (Internet Protocol Version 6, version 6 of Internet Protocol) device supporting in-band telemetry, which transmits IPv6 data packets.
The customer service center 600 in the embodiment of fig. 2 may be a cloud server cluster in which customers provide services.
When a client device sends traffic flow data of a target service (referred to as traffic flow data in a first direction) to a client service center sequentially through a router 1 to a router 7, the router 1 may be referred to as a first node device (also may be abbreviated as a first node), the router 7 may be referred to as a destination node device (also may be abbreviated as a tail node), the router 1 to the router 2 are a hop-by-hop link, in the hop-by-hop link, the router 1 may be referred to as a source node device, and the router 2 may be referred to as a sink node device; the router 2 to the router 3 are a hop-by-hop link, in the hop-by-hop link, the router 2 may also be referred to as a source node device, and the router 3 may be referred to as a sink node device; the router 3 to the router 4 are a hop-by-hop link, in the hop-by-hop link, the router 3 may also be referred to as a source node device, and the router 4 may be referred to as a sink node device; the router 4 to the router 5 are a hop-by-hop link, in the hop-by-hop link, the router 4 may also be referred to as a source node device, and the router 5 may be referred to as a sink node device; the router 5 to the router 6 are a hop-by-hop link, in the hop-by-hop link, the router 5 may also be referred to as a source node device, and the router 6 may be referred to as a sink node device; router 6 to router 7 are a hop-by-hop link in which router 6 may also be referred to as a source node device and router 7 may be referred to as a sink node device.
The in-band telemetry instance setting module 210 in the embodiment of fig. 2 may be used to set in-band telemetry instance configuration information including an in-band telemetry data collection object (e.g., a circuit in which a target service issued by the client device 300 is located), a collection period, a detection mode (end-to-end mode or hop-by-hop mode), and a default performance threshold (). The default performance threshold is a Service Level Agreement (SLA) for maintaining a Service Level from end to end in the client Service, and the settings of the time delay, packet loss rate, bandwidth utilization rate, and other indexes included in the SLA may be used as default values when the SLA is default, or may be modified on the basis of the default values and stored in the data storage module 250 for calculation.
The network element configuration issuing module 220 in the embodiment of fig. 2 may be configured to issue the in-band telemetry instance configuration information set by the in-band telemetry instance setting module 210 to a network router, and activate the in-band telemetry instance, some of the in-band telemetry instance configuration information may be issued to a first node device and a last node device, and some of the in-band telemetry instance configuration information may need to be issued globally, that is, issued to all node devices through which the target service passes, according to different vendor devices (each node device) through which the target service passes, for example, when the detection mode is an end-to-end mode, the in-band telemetry instance configuration information may be issued to a router of the first node, or issued to routers of the first node and the last node; when the detection mode is a hop-by-hop mode, the detection mode may be issued only to the router of the head node, or may be issued globally, specifically considering the degree supported by the device. The network element configuration issuing module 220 can also handle the difference of the configuration interface protocols of different manufacturers.
The data collection module 230 in the embodiment of fig. 2 may be configured to collect port data of each node device, for example, each router (may also be referred to as a network router), through which the target service passes, where the port data may include, for example, a port number, a port type, and a port rate, and includes a head port rate of the head node device and a destination port rate of the destination node device, and receive in-band telemetry data reported by the network router through which the target service passes, and the in-band telemetry data includes a head-to-tail node transmission traffic of the target service, a number of transceiving packets, and a transceiving timestamp, where the head node transmission traffic may also be referred to as the head port traffic of the head node device, the tail node transmission traffic may also be referred to as the destination port traffic of the destination node device, the number of transceiving packets includes a number of the head node device and a number of the destination node device, and the transceiving timestamp may include a transceiving timestamp of the head node device and a receiving timestamp of the destination node device.
The data preprocessing module 240 in the embodiment of fig. 2 may be configured to process the collected port data and in-band telemetry data (including end-to-end in-band telemetry data), for example, calculate a maximum/average bandwidth utilization rate, a maximum/average packet loss rate, a maximum/average bidirectional delay, and the like, process the end-to-end service SLA and the number of nodes of the service flow path to obtain hop-by-hop SLA data, and process association between the port data and the in-band telemetry data.
The performance index in the embodiment of the present disclosure may be calculated as a maximum performance index value in a certain time period, or may be calculated as an average value, and is set according to requirements, for example, the bandwidth utilization rate is obtained as a maximum bandwidth utilization rate or an average bandwidth utilization rate in a certain time period, the packet loss rate is obtained as a maximum packet loss rate or an average packet loss rate in a certain time period, and the bidirectional delay is obtained as a maximum bidirectional delay or an average bidirectional delay in a certain time period.
In the embodiment of the disclosure, the SLA itself is the setting of the end-to-end performance index of the service, and when calculating the hop-by-hop service quality, the hop-by-hop performance index threshold value needs to be calculated based on the SLA value, where only the delay is different from the SLA performance index. The hop-by-hop delay threshold can be obtained by dividing the delay set in the SLA by the number of end-to-end links.
For example, assuming that the bidirectional delay set in the SLA is 10 seconds and a total of five links are included, the hop-by-hop bidirectional delay threshold is 10/5=2 seconds.
Time characteristic data can be further constructed in the embodiment of the disclosure, and the time characteristic data can include a holiday time characteristic (for example, a holiday label is 1, and a normal workday label is 0) and a busy hour time characteristic when a normal workday is idle (for example, 8 o 'clock-17 o' clock is busy, and 17 o 'clock-next day 8 o' clock is idle).
For example, the calculated in-band telemetry performance indicator data may be calculated using the following formula for constructing an end-to-end quality of service classification evaluation model and a hop-by-hop link quality of service classification evaluation model:
Figure BDA0003269660740000121
in the above formula (1), when performing end-to-end service quality evaluation, the port bandwidth utilization includes a head port bandwidth utilization and a destination port bandwidth utilization, the port traffic includes a head port traffic and a destination port traffic, and the port rate includes a head port rate and a destination port rate, and if the port bandwidth utilization is used for calculating the head port bandwidth utilization, the head port traffic is divided by the head port rate; if used to calculate the destination port bandwidth utilization, then the destination port traffic is divided by the destination port rate. And finally, taking the maximum value of the bandwidth utilization rate of the head port and the bandwidth utilization rate of the tail port as a bandwidth utilization rate index required to be evaluated by the end-to-end service quality.
When the service quality of the hop-by-hop link is evaluated, the bandwidth utilization rate of the source port of each section of link is calculated to be used as a bandwidth utilization rate index of the hop-by-hop service quality needing to be evaluated.
One-way latency = sink port receive time-source port send time (2)
In the formula (2), the one-way delay includes an end-to-end one-way delay, when the end-to-end one-way delay in the first direction is calculated, the sink port receiving time may include destination port receiving time of the destination node device, the destination port receiving time may be determined according to a receiving timestamp of the destination node device, the source port sending time may include head port sending time of the head node device, and the head port sending time may be determined according to a sending timestamp of the head node device, that is, the end-to-end one-way delay = the destination port receiving time — the head port sending time; when calculating the end-to-end unidirectional delay in the second direction, the sink port receiving time may include the head port receiving time of the head node device, the head port receiving time may be determined according to the time stamp of the packet of the head node device, the source port sending time may include the destination port sending time of the tail node device, and the destination port sending time may be determined according to the sending time stamp of the destination node device, that is, the end-to-end unidirectional delay = the head port receiving time — the destination port sending time, that is, the head node and the tail node are exchanged at this time actually, and only the name of the first direction is still retained in the naming here.
Bidirectional delay = (sink port receiving time-source port sending time) + (source port receiving time-sink port sending time) (3)
The bidirectional delay in the above equation (3) may include an end-to-end bidirectional delay, the source port receiving time may include a head port receiving time, and the sink port sending time may include a destination port sending time, that is, end-to-end bidirectional delay = (destination port receiving time-head port sending time) + (head port receiving time-destination port sending time). An end-to-end service refers to a service from an input port (also referred to as a head port) on a first node device to an output port (also referred to as a destination port) on a last node device (also referred to as a last node) in a telecommunication network, because a service flow is directional, end-to-end service quality also refers to two directions, the head and last nodes are interchangeable, and the two directions are two different service flows, for example, taking fig. 2 as an example, when a client service center sequentially sends service flow data (in a second direction) to a client device through a router 7 to a router 1, the last node actually becomes the head node, and the first node becomes the last node.
In the embodiment of the disclosure, whether service flow data exist from an end-to-end first node to a tail node and from the tail node to the first node is firstly inquired, so as to judge whether bidirectional time delay needs to be calculated, if the service flow data exist in a bidirectional mode, the bidirectional time delay is calculated, and if the service flow data exist in a unidirectional mode, the bidirectional time delay does not need to be calculated.
Packet loss rate = (number of packets sent-number of packets received)/number of packets sent (4)
In the above formula (4), the packet loss ratio includes an end-to-end packet loss ratio, that is, an end-to-end packet loss ratio = (the number of packets sent by the first node device-the number of packets received by the destination node device)/the number of packets sent by the first node device.
The data storage module 250 in the embodiment of fig. 2 may be used to perform big data, efficiently and quickly store the acquired massive in-band telemetry data and port data, and may also be used to store data after preprocessing and generated by calculation, such as in-band telemetry performance index data.
The end-to-end service quality evaluation module 260 in the embodiment of fig. 2 may invoke an end-to-end service quality classification evaluation model in the fault risk calculation module 280 to perform end-to-end service quality classification evaluation processing according to performance data (in-band telemetry data) reported by in-band telemetry and port data of each node device, evaluate the end-to-end service quality, determine that the end-to-end service quality is normal, poor (short for poor quality) or interrupted, refer to an end-to-end service quality result determined to be interrupted or poor quality as a first end-to-end service state, and refer to an end-to-end service quality result determined to be normal as a second end-to-end service state. If the end-to-end quality of service is poor or interrupted, the detection mode in the in-band telemetry instance setting module 210 is adjusted to be a hop-by-hop mode, and hop-by-hop detection is performed.
The fault point location module 270 in the embodiment of fig. 2 may be configured to, for a target service whose end-to-end quality of service is poor or interrupted, further perform a hop-by-hop quality of service evaluation by combining hop-by-hop in-band telemetry data and port data, and locate a fault link and a port (referred to as a fault point) where the quality of service is poor or interrupted, so as to obtain a fault location result.
The fault-qualification section module 290 in the embodiment of fig. 2 may be configured to locate a section where the quality difference or the interruption occurs based on the output interface of the fault-point locating module 270 to determine a quality-difference qualification section result, for example, the sections of the metro network a, the backbone network CN2, and/or the metro network B shown in fig. 2, so as to determine the service quality optimization execution area.
The fault risk calculation module 280 in the embodiment of fig. 2 may train an end-to-end service quality classification evaluation model using a first machine learning model, and train a hop-by-hop link service quality classification evaluation model using a second machine learning model.
In the embodiment of the present disclosure, any suitable machine learning algorithm may be used for the first machine learning model and the second machine learning model, and the first machine learning model and the second machine learning model may be the same or different.
In the embodiment of fig. 2, the fault risk calculation module 280 may input the processed end-to-end in-band telemetry performance index data, port data, and time feature data as features into the trained end-to-end service quality classification evaluation model, which is convenient for the end-to-end service quality evaluation module 260 to call for end-to-end service quality evaluation; and inputting the processed in-band telemetry performance index data, port data and time characteristic data of the hop-by-hop link as characteristics into a trained hop-by-hop link service quality classification evaluation model, so that the fault point positioning module 270 can call the data to evaluate the hop-by-hop service quality.
In the embodiment of the present disclosure, the XGBoost algorithm may be represented by the following formula:
Figure BDA0003269660740000141
in the above formula (5), n is the number of packets sent by the head node, and the future quality of service can be predicted based on the performance index of the head node/source node in combination with the historical quality of service data, where n is a positive integer greater than or equal to 1, i is a positive integer greater than or equal to 1 and less than or equal to n, and F is a set of all possible classification regression numbers (which may include, for example, in-band telemetry numbers)Performance index data, port data, time characteristic data, tag data), f i (x i ) Is the ith sample x i The predicted scores (for the corresponding label types) of the leaf nodes obtained after the nth tree are input,
Figure BDA0003269660740000142
and (4) predicting the classification result of the ith sample, wherein the final classification type comprises three types of normal, poor and interrupted types.
Where the target function obj (θ) of the XGBoost algorithm may be expressed as:
Figure BDA0003269660740000143
in the above-mentioned formula (6),
Figure BDA0003269660740000144
as an error function of the XGboost algorithm, y i Label data representing an ith sample, based on a label data value>
Figure BDA0003269660740000145
The regularization term of the algorithm represents the total complexity of K trees, wherein K is a positive integer greater than or equal to 1, and K is a positive integer greater than or equal to 1 and less than or equal to K. During application of the XGboost algorithm, the objective function needs to be adjusted to the minimum.
The result sending module 2100 in the embodiment of fig. 2 may be configured to send the end-to-end service quality result, the fault location result, and the quality-difference definition segment result to an SDN (Software Defined Network) orchestrator to drive an external system for path optimization or for proposing a service quality evaluation requirement, where the SDN orchestrator and other external systems may not be included in the system 200 provided in the embodiment of the present disclosure.
In the embodiment of the present disclosure, as shown in fig. 2, in the system architecture for implementing in-band telemetry fault location, in the quality evaluation and fault location process, the modules cooperate with each other, for example:
1. the end-to-end quality assessment module 260 and the fault point location module 270 may invoke the fault risk calculation module 280, for example, the same XGBoost algorithm may be used, and the input features and parameters are different during the training and using process for the end-to-end service quality classification assessment and the hop-by-hop link service quality classification assessment.
2. The data stored in the data storage module 250 may configure data information (such AS a source port ID (identity), a destination port ID), hop-by-hop link port configuration information (such AS a source port node IP, a sink port node IP), and link definition segment information (i.e., a quality difference definition segment result (a definition segment result including quality difference or interruption), such AS an AS (autonomous system) domain of a faulty link) based on the end-to-end port data information, respectively, and may also provide the stored data to the end-to-end quality evaluation module 260, the fault point location module 270, and the fault definition segment module 290.
The above end-to-end port data and the hop-by-hop link port data also include port rates, but the port rates do not belong to the configuration data.
3. The in-band telemetry instance setting module 210 may evaluate the obtained end-to-end quality of service result based on the end-to-end quality evaluation module 260, and modify the detection mode to be a hop-by-hop mode in case the end-to-end quality of service result is an interruption or a poor quality.
Fig. 3 schematically shows a flow chart of a service fault location method according to another embodiment of the present disclosure.
The process of implementing end-to-end service quality evaluation by the end-to-end service quality evaluation module in the embodiment of the present disclosure may be as shown in fig. 3.
In the embodiment of the disclosure, end-to-end service quality evaluation can be performed based on end-to-end in-band telemetry performance index data obtained by calculation of the formulas (1) to (4), an XGBoost algorithm (other algorithms can be applied, and the algorithm is taken as an example in the disclosure) is trained based on historical label data to obtain an end-to-end service quality classification evaluation model, the end-to-end service quality evaluation module calls the end-to-end service quality classification evaluation model in the fault risk calculation module to perform end-to-end service quality classification, and an end-to-end service quality type, namely whether the end-to-end service quality is normal, poor or interrupted, is evaluated to determine an end-to-end service quality result.
The historical tag data may include, for example, an interruption tag, a poor quality tag, and a normal tag, where the interruption tag may include a case where the packet loss rate is 100% and a case where the packet loss rate does not reach 100% but a customer complaint is received.
As shown in fig. 3, the following steps may be included.
Step S1, inquiring whether the end-to-end head node to the tail node and the tail node to the head node all have service flow data, judging whether the bidirectional time delay needs to be calculated or not, if the bidirectional time delay exists, performing step S2, and if the bidirectional time delay does not exist, performing step S3.
And S2, inputting end-to-end in-band telemetry performance index data, port data and time characteristic data (including holiday time characteristics and idle and busy time characteristics of normal working days), and jumping to the step S4.
The end-to-end in-band telemetry performance indicator data in step S2 may include, for example, an end-to-end packet loss rate, an end-to-end one-way delay (which may be used to determine which direction of delay is problematic), an end-to-end two-way delay, a head port bandwidth utilization rate, a destination port bandwidth utilization rate, a packet sending number (i.e., a number of messages sent or a number of packets sent), and a packet receiving number (i.e., a number of messages received or a number of packets received).
The port data may include port data of the head node device and port data of the destination node device, and the port data may include any one or more of a port type, a port name, a port rate, and the like.
And S3, inputting end-to-end in-band telemetry performance index data, port data and time characteristic data (including holiday time characteristics and idle and busy time characteristics of normal working days), and jumping to the step S4.
The end-to-end in-band telemetry performance index data in step S3 may include, for example, an end-to-end packet loss rate, an end-to-end one-way delay, a head port bandwidth utilization rate, a destination port bandwidth utilization rate, a packet sending number, and a packet receiving number.
In steps S2 and S3, corresponding end-to-end performance index data with a history label, device port data, and time characteristic data are input to perform end-to-end service quality classification evaluation model training, and a trained end-to-end service quality classification evaluation model is obtained.
And S4, carrying out normalization processing (nominal type feature coding) on port data and time feature data of head and tail node equipment, and carrying out standardization processing on end-to-end in-band telemetering performance index data based on a min-max (minimum-maximum) algorithm. The min-max algorithm is as follows:
Figure BDA0003269660740000171
wherein x is * The method comprises the steps of obtaining a normalization result of single end-to-end in-band telemetry performance index data (such as an end-to-end packet loss rate), wherein x is a latest measured value of the single end-to-end in-band telemetry performance index data (such as an end-to-end packet loss rate), max is a maximum value of historical data of the single end-to-end in-band telemetry performance index data, and min is a minimum value of the historical data of the single end-to-end in-band telemetry performance index data.
Since the port data and the time characteristic data of the head and tail node devices are non-numerical, the port data and the time characteristic data are converted into nominal type characteristic codes to be applied to an algorithm model, for example, different port types of the head and tail node devices are converted into 0,1,2,3 \8230, holiday time is expressed as 1, normal working day is expressed as 0, and the like.
Because the in-band telemetering performance index data, such as packet loss rate, time delay, bandwidth utilization rate, unit difference of data packets, have large numerical value difference, and need to be unified to the interval of 0-1, the influence of large or small data on the result is reduced.
And S5, calling an end-to-end service quality classification evaluation model of the fault risk calculation module to perform end-to-end service quality classification on port data and time characteristic data of the head node equipment and the tail node equipment after the normalization processing and end-to-end in-band telemetering performance index data after the normalization processing so as to obtain an end-to-end service quality evaluation classification result, namely an end-to-end service quality result.
S6, judging whether the output end-to-end service quality evaluation classification result is normal, and if so, executing the step S7; if the quality is poor or interrupted, step S8 is executed.
And S7, outputting a normal end-to-end service quality evaluation classification result.
The end-to-end service quality classification evaluation model is obtained by adopting XGboost algorithm training, and the method comprises the following steps: inputting end-to-end historical in-band telemetry performance index data of tagged data (such as an end-to-end circuit interruption tag, an end-to-end quality difference tag and an end-to-end normal tag), historical port data of head-to-tail node equipment and historical time characteristic data into an XGboost model; and carrying out normalization processing (nominal type feature coding) on the tag data, the historical port data and the historical time feature data, and carrying out standardization processing on the end-to-end historical in-band telemetry performance index data based on a min-max algorithm. Wherein the min-max algorithm can refer to the above equation (7), when x in equation (7) * The method comprises the steps of obtaining a standardized result of single end-to-end historical in-band telemetry performance index data (such as historical end-to-end packet loss rate), obtaining a latest measured value of the single end-to-end historical in-band telemetry performance index data (such as the historical end-to-end packet loss rate), obtaining max of a maximum value (namely a sample data value) of the single end-to-end historical data of the single end-to-end historical in-band telemetry performance index data, and obtaining min of a minimum value (namely a sample data value) of the single end-to-end historical data of the single end-to-end historical in-band telemetry performance index data.
Sample data in the sample data set can be divided into a training set and a testing set, training of an end-to-end service quality classification evaluation model is carried out by using the training data in the training set based on an XGboost algorithm, and a loss function is adjusted during training to be minimum; and testing the end-to-end service quality classification evaluation model by using the test set. After the end-to-end service quality classification evaluation model is operated on line, parameter optimization can be automatically carried out.
The process of the fault point location module implementing fault point/problem point/fault location and the fault definition segment module implementing fault definition segment may continue with reference to fig. 3.
For the end-to-end service quality poor or interrupted condition, invoking a hop-by-hop link quality classification evaluation model of the fault risk calculation module to perform fault point positioning with a detection mode being a hop-by-hop mode, thereby positioning a specific fault point of the end-to-end circuit/link, which may include the following steps.
And S8, when the end-to-end service quality is poor or interrupted, inputting in-band telemetering performance index data, port data and time characteristic data (including holiday time characteristics and idle busy time characteristics in normal working days) of each hop-by-hop link.
The in-band telemetry performance index data of the hop-by-hop link in step S8 may include, for example, a packet loss rate, a one-way delay, a two-way delay, a source port bandwidth utilization rate of the source node device, a sink port bandwidth utilization rate of the sink node device, a packet sending number, and a packet receiving number of the corresponding hop-by-hop link. The source port bandwidth utilization and the sink port bandwidth utilization may be referred to above in equation (1), i.e., source port bandwidth utilization = source port traffic divided by source port rate, and sink port bandwidth utilization = sink port traffic divided by sink port rate.
The port data of the hop-by-hop link may include port data of the source node device and port data of the sink node device, the port data may include port configuration data and port performance data, the port configuration data includes a port type, a port name, a node IP, a port number, and the like, and the port performance data mainly includes a port rate, that is, may include a source port rate of the source node device and a sink port rate of the sink node device.
The one-way delay of the hop-by-hop link and the two-way delay of the hop-by-hop link can be respectively calculated according to the following formulas:
hop-by-hop link one-way delay = end-to-end service one-way delay/number of links (8)
Hop-by-hop link bidirectional delay = end-to-end service bidirectional delay/number of links (9)
The number of links in the above equations (8) and (9) refers to the number of links through which the target traffic passes, for example, the number of links =6 in fig. 2.
Step S9 may refer to step S4 above, and similarly perform normalization processing (nominal type feature coding) on the port data of the source node device and the port data and the time feature data of the sink node device of each hop-by-hop link, and perform normalization processing on the in-band telemetry performance index data of each hop-by-hop link based on the min-max algorithm.
And step S10, calling a hop-by-hop link quality classification evaluation model in the fault risk calculation module to perform hop-by-hop link service quality classification evaluation processing on the port data and the time characteristic data of each hop-by-hop link after the normalization processing and the in-band telemetering performance index data of each hop-by-hop link after the normalization processing, wherein the hop-by-hop link quality classification evaluation model obtains a hop-by-hop link service quality result of the hop-by-hop link.
The hop-by-hop link quality classification evaluation model can be obtained by training based on an XGboost algorithm, and the training process can comprise the following steps:
(1) The historical data of each hop-by-hop link when the end-to-end service quality is poor or interrupted may include label data of each hop-by-hop link (for example, may include a hop-by-hop link interruption label, a poor quality label, and a normal label), historical in-band telemetry performance index data (may include historical packet loss rate, historical one-way delay (or may also include historical two-way delay if the end-to-end is two-way), historical bandwidth utilization rate of a source port, historical bandwidth utilization rate of a sink port, number of sent messages, and number of received messages), historical port data (may include port type, port name, and/or port rate), and historical time characteristic data (including holiday time characteristics and/or busy time characteristics of a normal working day) as sample data of each hop-by-hop link, and is divided into a training set and a test set.
And carrying out normalization processing (nominal type feature coding) on the tag data, the historical port data and the historical time feature data of the hop-by-hop link, and carrying out standardization processing on the historical in-band telemetering performance index data of the hop-by-hop link based on a min-max algorithm. Wherein the min-max algorithm can refer to the above equation (7), when x in equation (7) * Is the standardized result of the historical in-band telemetering performance index data of the single hop-by-hop link, x is the latest measured value of the historical in-band telemetering performance index data of the single hop-by-hop link, max is the maximum value of the historical data of the historical in-band telemetering performance index data of the single hop-by-hop link (namely the maximum value of sample data of the hop-by-hop link), and min is the minimum value of the historical data of the historical in-band telemetering performance index data of the single hop-by-hop link (namely the minimum value of the sample data of the hop-by-hop link)
(2) And (4) carrying out hop-by-hop link quality classification evaluation model training by using sample data in the training set and based on the XGboost algorithm, and adjusting a loss function during training to minimize the loss function.
(3) And performing hop-by-hop link quality classification evaluation on the test set based on the trained hop-by-hop link quality classification evaluation model.
And step S11, outputting a hop-by-hop link service quality result (interruption/poor quality/normal) of the hop-by-hop link.
Step S12, returning link information of the failed link, where the result of the hop-by-hop link quality of service of the hop-by-hop link is poor and interrupted (referred to as a first hop-by-hop service state), and the link information may include, for example, a link ID of the failed link, a source node device ID corresponding to the failed link, and a sink node device ID, so as to implement fault point positioning.
The result of the normal hop-by-hop link quality of service may be referred to as a second hop-by-hop service state.
And S13, returning a source port IP of the source node device, a sink port IP of the sink node device, a source and sink port IP of the corresponding fault link and an AS domain where the fault link is located according to the link ID, the source node device ID and the sink node device ID of the fault link, and realizing a fault definition section.
For example, also taking fig. 2 as an example, assume that the link between router 2 to router 3 is identified as a failed link, and router 2 is located as a failure point.
And S14, outputting an end-to-end service quality result, a fault positioning result and a quality difference definition segment result which are poor or interrupted.
The scheme provided by the embodiment of the disclosure can be applied to an SDN controller of a telecommunication network, the quality of end-to-end service is evaluated, and specific poor quality or fault node equipment and links can be further positioned for poor quality service, so that the SDN controller can carry out service quality guarantee and network optimization. The scheme provided by the embodiment of the disclosure can also be applied to customer service completion acceptance and provides a service quality SLA guarantee basis.
The scheme provided by the embodiment of the disclosure can provide an intelligent and automatic means for the maintenance and guarantee of the IP network, solves the problems of long operation time, complex operation, difficult multi-party coordination and the like of the existing network, avoids the influence on the customer service, greatly improves the operation efficiency, reduces the operation and maintenance cost and improves the customer perception; and an intelligent means can be provided for completion acceptance of the IP service, the SLA quality of the customer service is ensured, and unnecessary investment waste is avoided.
The following describes the processing procedure of implementing service quality evaluation and fault location by matching each functional module:
the data preprocessing module processes data in a current acquisition period (for an end-to-end mode, one acquisition period and for a hop-by-hop mode, the current hop-by-hop link in the hop-by-hop links), calculates in-band telemetering performance index data such as bandwidth utilization rate, packet loss rate and time delay and constructs time characteristic data; and storing the in-band telemetering performance index data and the time characteristic data, the acquired data and the service ID of the target service into a data storage module.
The end-to-end service quality evaluation module takes the service ID as an index, reads end-to-end SLA data, end-to-end in-band telemetry performance index data corresponding to the current acquisition period, port data and time characteristic data from the data storage module, and sends the data and a calculation instruction to the fault risk calculation module.
After receiving the calculation instruction and the related data, the fault risk calculation module inputs the data into a pre-trained end-to-end service quality classification evaluation model, and sends the output end-to-end service quality result to the end-to-end service quality evaluation module.
And the end-to-end service quality evaluation module determines the network quality according to the output end-to-end service quality result.
And if the end-to-end service quality result is poor quality or interruption, the end-to-end service quality evaluation module sends a switching instruction to the in-band telemetering instance setting module and sends a working instruction to the fault point positioning module.
The in-band telemetering instance setting module firstly issues an end-to-end in-band telemetering instance to acquire end-to-end performance indexes, problems including quality difference and interruption occur in end-to-end service quality, after a switching instruction is received, a detection mode of the in-band telemetering is configured into a hop-by-hop mode, the in-band telemetering instance of the hop-by-hop mode is issued through a network element configuration issuing module to acquire the hop-by-hop performance indexes, the hop-by-hop service quality is further evaluated, and the bound section analysis of hop-by-hop problem points (including quality difference/interruption) is further performed.
After receiving the working instruction, the fault point positioning module reads SLA data, in-band telemetry performance index data on the current hop-by-hop link, port data and time characteristic data from the data storage module by taking the service ID as an index, and sends the data and the calculation instruction to the fault risk calculation module.
After receiving the calculation instruction and the related data, the fault risk calculation module inputs the data into a pre-trained hop-by-hop link service quality classification evaluation model, and sends hop-by-hop link service quality results output by the hop-by-hop link service quality classification evaluation model to the fault point positioning module.
If the service quality result of the hop-by-hop link is poor or interrupted, the fault point positioning module sends the port information of the current hop-by-hop link to the fault definition segment module, wherein the port information of the hop-by-hop link refers to the position information of the segment needing to be defined, and comprises source port IP, sink port IP, AS domain and the like.
And judging whether the hop-by-hop traversal is completed or not, if the traversal is completed, triggering an end-to-end quality evaluation module to carry out end-to-end quality evaluation of the next acquisition period, and if the traversal is not completed, continuing to carry out fault risk calculation on the next hop-by-hop link by a fault point positioning module.
It should also be understood that the above description is intended only to assist those skilled in the art in better understanding the embodiments of the present disclosure, and is not intended to limit the scope of the embodiments of the present disclosure. Various equivalent modifications or changes will be apparent to those skilled in the art in light of the above examples given, for example, some steps in the above methods may not be necessary, or some steps may be newly added, etc. Or a combination of any two or more of the above embodiments. Such modifications, variations, or combinations are also within the scope of the embodiments of the present disclosure.
It should also be understood that the foregoing descriptions of the embodiments of the present disclosure have been provided with an emphasis on differences between the various embodiments, and the same or similar components that are not mentioned may be referenced with each other and will not be repeated here for the sake of brevity.
It should also be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiment of the present disclosure.
It is also to be understood that the terminology and/or the description of the various embodiments are consistent and mutually exclusive, and that the technical features of the various embodiments may be combined to form a new embodiment according to their inherent logical relationships, unless otherwise specified or logically conflicting, in the various embodiments of the present disclosure.
The above details examples of the service fault location method provided by the present disclosure. It is understood that the computer device comprises hardware structures and/or software modules for performing the functions in order to realize the functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The service failure locating device provided by the present disclosure will be described below.
Fig. 4 schematically shows a schematic block diagram of a service fault location device according to an embodiment of the present disclosure.
As shown in fig. 4, the service failure location apparatus 400 provided by the embodiment of the present disclosure may include an end-to-end in-band telemetry performance index data obtaining unit 410, an end-to-end quality of service result obtaining unit 420, a hop-by-hop in-band telemetry performance index data obtaining unit 430, a hop-by-hop link quality of service result obtaining unit 440, and a failure link location unit 450.
The end-to-end in-band telemetry performance index data acquisition unit 410 may be configured to acquire end-to-end in-band telemetry performance index data and port data of the target service in the current acquisition period.
The end-to-end service quality result obtaining unit 420 may be configured to perform end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period, so as to obtain an end-to-end service quality result.
The hop-by-hop in-band telemetry performance index data obtaining unit 430 may be configured to obtain in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period if the end-to-end service quality result is the first end-to-end service state.
The hop-by-hop link quality of service result obtaining unit 440 may be configured to perform hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period, so as to obtain a hop-by-hop link quality of service result of each hop-by-hop link.
The faulty link positioning unit 450 may be configured to position a faulty link of the target service in the current acquisition period according to a result of the quality of service of the hop-by-hop links.
Other aspects of the embodiment of fig. 4 may be found in relation to other embodiments described above.
It should be understood that the end-to-end in-band telemetry performance indicator data acquisition unit 410 may be implemented by a transceiver, and the end-to-end quality of service result acquisition unit 420, the hop-by-hop in-band telemetry performance indicator data acquisition unit 430, the hop-by-hop link quality of service result acquisition unit 440, and the failed link location unit 450 may be implemented by a processor. The service fault locating device 400 may further include a storage unit, which may be implemented by a memory. The service fault locating apparatus shown in fig. 5 may be disposed on a computer device 500, and may include a processor 510, a memory 520, and a transceiver 530.
It should be understood that the above division of the units is only a functional division, and other division methods may be possible in actual implementation.
The embodiment of the present disclosure further provides a service fault location device, which includes a processor and an interface; the processor is configured to execute the service fault location method in any of the method embodiments.
It should be understood that the service fault locating device may be a chip. For example, the service fault locating Device may be a Field-Programmable Gate Array (FPGA), an Application-Specific Integrated Circuit (ASIC), a System Chip (System on Chip, soC), a Central Processing Unit (CPU), a Network Processor (NP), a Digital Signal processing Circuit (DSP), a microcontroller (Micro Controller Unit, MCU), a Programmable Logic Device (PLD) or other Integrated chips.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in a hardware processor, or in a combination of hardware and software modules. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
It should be noted that the processor in the embodiments of the present disclosure may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present disclosure may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The embodiments of the present disclosure further provide a computer-readable medium, on which a computer program is stored, where the computer program is executed by a computer to implement the service fault location method in any of the above method embodiments.
The embodiment of the present disclosure further provides a computer program product, and when executed by a computer, the computer program product implements the service fault location method in any of the above method embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions according to the embodiments of the present disclosure are produced in whole or in part when the computer instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the unit is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
In various embodiments of the present disclosure, terms and/or descriptions in different embodiments have consistency and may be mutually cited if not specifically stated or logically conflicting, and technical features in different embodiments may be combined to form a new embodiment according to their inherent logical relationships.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A method for locating a service fault is characterized by comprising the following steps:
acquiring end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period;
performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result;
if the end-to-end service quality result is a first end-to-end service state, acquiring in-band telemetering performance index data and port data of each hop-by-hop link of the target service in the current acquisition period;
carrying out hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link;
and positioning the fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link.
2. The method of claim 1, wherein obtaining end-to-end in-band telemetry performance indicator data and port data of a target service during a current acquisition period comprises:
setting in-band telemetry instance configuration information, wherein the in-band telemetry instance configuration information comprises the target service, an acquisition period and a detection mode;
if the detection mode is an end-to-end mode, issuing an end-to-end in-band telemetry instance to the first node equipment through which the target service passes so that the first node equipment activates the end-to-end in-band telemetry instance and collects in-band telemetry data of the first node equipment and the target node equipment;
receiving in-band telemetry data of the head node device and the destination node device;
and acquiring end-to-end in-band telemetry performance index data of the target service in the current acquisition period according to the in-band telemetry data of the first node device and the target node device and the end-to-end port data of the target service in the current acquisition period.
3. The method of claim 2, wherein said end-to-end port data includes a head port rate of said head node device and a destination port rate of said destination node device;
the in-band telemetry data of the head node device and the destination node device includes destination port traffic of the destination node device, head port traffic of the head node device, destination port receiving time of the destination node device, head port sending time of the head node device, packet sending number of the head node device, and packet receiving number of the destination node device;
the end-to-end in-band telemetry performance index data comprises a head port bandwidth utilization rate, a target port bandwidth utilization rate, an end-to-end one-way time delay and an end-to-end packet loss rate of the head node equipment;
obtaining end-to-end in-band telemetry performance index data of the target service in the current acquisition period according to the in-band telemetry data of the first node device and the target node device and the end-to-end port data of the target service in the current acquisition period, wherein the obtaining of the end-to-end in-band telemetry performance index data of the target service in the current acquisition period comprises:
obtaining the bandwidth utilization rate of the head port according to the head port flow and the head port rate;
obtaining the bandwidth utilization rate of the destination port according to the destination port flow and the destination port rate;
obtaining the end-to-end one-way time delay according to the receiving time of the destination port and the sending time of the head port;
and obtaining the end-to-end packet loss rate according to the packet sending number of the first node equipment and the packet receiving number of the target node equipment.
4. The method of claim 3, wherein the in-band telemetry data of the head node device and the destination node device further comprises a transmit timestamp of the head node device and a receive timestamp of the destination node device;
performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result, wherein the method comprises the following steps:
performing time characteristic processing on the receiving time stamp and the sending time stamp to obtain time characteristic data;
and calling an end-to-end service quality classification evaluation model to process the packet sending number of the first node equipment, the packet receiving number of the target node equipment, the time characteristic data, the head port bandwidth utilization rate, the target port bandwidth utilization rate, the end-to-end one-way time delay, the end-to-end packet loss rate and the end-to-end port data to obtain an end-to-end service quality result.
5. The method of claim 4, further comprising:
acquiring end-to-end historical in-band telemetering performance index data, historical port data, historical time characteristic data and an end-to-end service quality label of the target service in a historical acquisition period;
and training a first machine learning model by using end-to-end historical in-band telemetering performance index data, historical port data, historical time characteristic data and an end-to-end service quality label of the target service in a historical acquisition period to obtain the end-to-end service quality classification evaluation model.
6. The method of claim 3, wherein obtaining end-to-end in-band telemetry performance indicator data of the target service in the current acquisition period according to in-band telemetry data of the head node device and the destination node device and end-to-end port data of the target service in the current acquisition period, further comprises:
if bidirectional service flow data exists between the first node device and the destination node device, acquiring the receiving time of a first port of the first node device and the sending time of a destination port of the destination node device;
obtaining an end-to-end bidirectional time delay according to the receiving time of the destination port, the sending time of the head port, the receiving time of the head port and the sending time of the destination port;
the end-to-end in-band telemetry performance indicator data further comprises the end-to-end bi-directional delay.
7. The method of claim 6, wherein the in-band telemetry data of the head node device and the destination node device further comprises a transmit timestamp of the head node device and a receive timestamp of the destination node device;
performing end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result, wherein the method comprises the following steps:
performing time characteristic processing on the receiving time stamp and the sending time stamp to obtain time characteristic data;
and calling an end-to-end service quality classification evaluation model to process the packet sending number of the first node equipment, the packet receiving number of the target node equipment, the time characteristic data, the head port bandwidth utilization rate, the target port bandwidth utilization rate, the end-to-end one-way time delay, the end-to-end two-way time delay, the end-to-end packet loss rate and the end-to-end port data, and obtaining an end-to-end service quality result.
8. The method of claim 2, wherein if the end-to-end quality of service result is a first end-to-end service state, acquiring in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period comprises:
if the end-to-end service quality result is the first end-to-end service state, adjusting a detection mode in the in-band telemetry instance configuration information from the end-to-end mode to a hop-by-hop mode, performing hop-by-hop detection on each node device passed by the target service, and acquiring in-band telemetry data and port data of each hop-by-hop link;
and acquiring in-band telemetry performance index data of each hop-by-hop link according to the in-band telemetry data of each hop-by-hop link.
9. The method of claim 1, wherein performing a hop-by-hop link quality classification evaluation process on the in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period to obtain a hop-by-hop link service quality result of each hop-by-hop link comprises:
and calling a hop-by-hop link quality classification evaluation model to process the in-band telemetry performance index data and the port data of each hop-by-hop link according to the in-band telemetry performance index data and the port data of each hop-by-hop link, and obtaining a hop-by-hop link service quality result of each hop-by-hop link.
10. The method of claim 1, wherein locating the failed link of the target service in the current acquisition period according to the hop-by-hop link quality of service result of each hop-by-hop link comprises:
taking the hop-by-hop link with the service quality result of the hop-by-hop link in the first hop-by-hop service state as the fault link;
acquiring a link identifier of the fault link and a source node device identifier and a destination node device identifier corresponding to the fault link;
and obtaining the network address of the source node device corresponding to the fault link according to the source node device identifier of the fault link, obtaining the network address of the sink node device corresponding to the fault link according to the sink node device identifier of the fault link, and determining the autonomous system domain where the fault link is located according to the link identifier of the fault link.
11. A service fault location device, comprising:
an end-to-end in-band telemetering performance index data acquisition unit, configured to acquire end-to-end in-band telemetering performance index data and port data of a target service in a current acquisition period;
an end-to-end service quality result obtaining unit, configured to perform end-to-end service quality classification evaluation processing on the end-to-end in-band telemetry performance index data and the port data of the target service in the current acquisition period to obtain an end-to-end service quality result;
a hop-by-hop in-band telemetry performance index data acquisition unit, configured to acquire in-band telemetry performance index data and port data of each hop-by-hop link of the target service in the current acquisition period if the end-to-end service quality result is a first end-to-end service state;
a hop-by-hop link service quality result obtaining unit, configured to perform hop-by-hop link quality classification evaluation processing on the in-band telemetry performance index data and the port data of each hop-by-hop link of the target service in the current acquisition period, so as to obtain a hop-by-hop link service quality result of each hop-by-hop link;
and the fault link positioning unit is used for positioning the fault link of the target service in the current acquisition period according to the hop-by-hop link service quality result of each hop-by-hop link.
12. A computer device, comprising:
at least one processor;
storage means for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-10.
13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.
14. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1-10.
CN202111097969.1A 2021-09-18 2021-09-18 Service fault positioning method and related equipment Pending CN115842717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111097969.1A CN115842717A (en) 2021-09-18 2021-09-18 Service fault positioning method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111097969.1A CN115842717A (en) 2021-09-18 2021-09-18 Service fault positioning method and related equipment

Publications (1)

Publication Number Publication Date
CN115842717A true CN115842717A (en) 2023-03-24

Family

ID=85574238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111097969.1A Pending CN115842717A (en) 2021-09-18 2021-09-18 Service fault positioning method and related equipment

Country Status (1)

Country Link
CN (1) CN115842717A (en)

Similar Documents

Publication Publication Date Title
US20220086073A1 (en) Data packet detection method, device, and system
WO2018028573A1 (en) Method and device for fault handling, and controller
CN109495322A (en) Network failure locating method, relevant device and computer storage medium
CN112564964B (en) Fault link detection and recovery method based on software defined network
US7898971B2 (en) Method and apparatus for automating hub and spoke Internet Protocol Virtual Private Network trouble diagnostics
US20150304191A1 (en) Method and apparatus for automatically determining causes of service quality degradation
CN112866042B (en) Network quality detection method, device, computer equipment and computer readable medium
CN110830290B (en) Network topology generation method and server
CN108429625B (en) Method and device for realizing fault diagnosis
CN103416022A (en) In-service throughput testing in distributed router/switch architectures
CN112203172B (en) Special line opening method and device
CN108989128B (en) Fault positioning method and device based on networking structure
CN109964450B (en) Method and device for determining shared risk link group
US20220247651A1 (en) System and method for network and computation performance probing for edge computing
CN107769964B (en) Special line checking method and system
US20170104635A1 (en) Physical adjacency detection systems and methods
CN108494625A (en) A kind of analysis system on network performance evaluation
CN110896544B (en) Fault delimiting method and device
CN101431435B (en) Connection-oriented service configuration and management method
CN113810238A (en) Network monitoring method, electronic device and storage medium
CN115842717A (en) Service fault positioning method and related equipment
CN110620693A (en) Railway station route remote restart control system and method based on Internet of things
US20220390929A1 (en) Method, A System And A Computer Program Product For Monitoring An Industrial Ethernet Protocol Type Network
CN116248479A (en) Network path detection method, device, equipment and storage medium
CN107005440A (en) A kind of method of link failure positioning, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination