CN117319256A - Network link fault diagnosis method and device - Google Patents

Network link fault diagnosis method and device Download PDF

Info

Publication number
CN117319256A
CN117319256A CN202211192269.5A CN202211192269A CN117319256A CN 117319256 A CN117319256 A CN 117319256A CN 202211192269 A CN202211192269 A CN 202211192269A CN 117319256 A CN117319256 A CN 117319256A
Authority
CN
China
Prior art keywords
address
network
network path
path
dial testing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211192269.5A
Other languages
Chinese (zh)
Inventor
周艳春
秦永钢
林雁青
刘家赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to PCT/CN2023/080443 priority Critical patent/WO2023241122A1/en
Publication of CN117319256A publication Critical patent/CN117319256A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Abstract

The embodiment of the application discloses a fault diagnosis method of a network link, which is used for improving the fault diagnosis effect of the network link. The method comprises the following steps: the cloud server generates a dial testing task according to dial testing information configured by a user, wherein the dial testing information comprises a source network protocol IP address and a destination IP address; running a dial testing task, and displaying a dial testing result, wherein the dial testing result comprises at least one network path, the network path comprises equipment between equipment corresponding to a source IP address and equipment corresponding to a destination IP address, and the at least one network path comprises at least one of the following: virtual network path, physical network path.

Description

Network link fault diagnosis method and device
The present application claims priority from chinese patent office, application number 202210690948.9, entitled "a network link failure diagnosis method, apparatus, and device," filed on day 17, 6, 2022, the entire contents of which are incorporated herein by reference.
Technical Field
The embodiment of the application relates to the field of computers, in particular to a network link fault diagnosis method and device.
Background
With the rapid development of cloud computing, many enterprises migrate their own services to a cloud platform, so that the complexity of a virtual network is higher and higher, and cloud service providers provide verification tools for various virtual networks to assist operation and maintenance personnel of network links to verify virtual networks configured by tenants.
The path analysis of the virtual network provided by most cloud manufacturers is static analysis based on virtual network configuration, so that the network link inspection result obtained based on the static analysis of the network configuration cannot monitor the packet loss condition of the physical network, and the operation and maintenance personnel of the network link cannot know the real flow trend of the network link in detail, so that the network link fault diagnosis effect is poor.
Disclosure of Invention
The embodiment of the application provides a network link fault diagnosis method and device, which are used for improving the fault diagnosis effect of a network link.
The first aspect of the embodiments of the present application provides a fault diagnosis method for a network link, where the method may be executed by a cloud server, may also be executed by a component of the cloud server, for example, a processor, a chip, or a chip system of the cloud server, or may also be implemented by a logic module or software capable of implementing all or part of functions of the cloud server. Taking cloud server execution as an example, the fault diagnosis method of the first aspect includes: and the cloud server generates a dial testing task according to dial testing information configured by a user, wherein the dial testing information comprises a source network protocol IP address and a destination IP address. The cloud server runs a dial testing task, a dial testing result is displayed, the dial testing result comprises at least one network path and information such as message flow trend in the network path, the network path comprises equipment between equipment corresponding to a source IP address and equipment corresponding to a destination IP address, and the at least one network path comprises at least one of the following: virtual network path, physical network path.
According to the method and the device for diagnosing the link network fault, the cloud server achieves the link network fault diagnosis from the source IP address to the destination IP address through running the dial testing task, and meanwhile, the cloud server displays dial testing results of virtual network paths and physical network paths of the network link to a user through a management interface, wherein the dial testing results comprise information such as real flow trend of network road strength, and therefore the fault diagnosis effect of the network link is improved, operation and maintenance difficulty of operation and maintenance personnel is reduced, and the fault diagnosis efficiency of the network link is improved.
In one possible embodiment, the dial test information further includes one or more of the following: source port, destination port, transport protocol. The dial testing information also comprises dial testing speed and dial testing quantity, wherein the dial testing speed refers to the frequency of sending dial testing messages, and the dial testing quantity refers to the quantity of sending dial testing messages. Specifically, a user configures dial testing information through a management interface to create a dial testing task.
According to the cloud server and the cloud server, the cloud server generates the dial testing task based on one or more dial testing information, and the feasibility of creating the dial testing task is improved.
In a possible implementation manner, the cloud server provides a management interface, and the management interface is used for displaying a transmission path of a network link, wherein the transmission path comprises a control plane path, a virtual network path and a physical network path. And the cloud server executes a dial testing task based on the network link to obtain a dial testing result of the network node between the source IP address and the destination IP address. And displaying the dial testing result of the path node in the transmission path in the management interface.
According to the cloud server, the dial testing results of the control plane path, the virtual network path and the physical network path of the network link are displayed to the user through the management interface, so that the operation and maintenance difficulty of operation and maintenance personnel is reduced, and the fault diagnosis efficiency of the network link is improved.
In a possible implementation manner, the cloud server acquires static resource information of the network link, wherein the static resource information comprises one or more of the following information: tenant virtual private cloud VPC, subnet, security group, routing table, port information, IP address and load balancing information. The cloud server establishes a control plane path of the network link based on the static resource information. Specifically, the cloud server acquires static resource information of a network link between a source IP address and a destination IP address, and establishes a control plane path between the source IP address and the destination IP address based on the static network resource information.
According to the cloud server, the control plane path of the network link can be established based on the static resource information of the network link, so that the feasibility of establishing the control plane path is improved.
In a possible implementation manner, the device included in the virtual network path and the device included in the physical network path have a mapping relationship, that is, a second mapping relationship. The virtual network path includes a device and the control plane path includes a device that has a first mapping relationship. And the cloud server maps the control plane path to obtain a virtual network path according to a first mapping relation, wherein the first mapping relation comprises a mapping relation from the control plane path node to the virtual network path node. And the cloud server displays the first mapping relation on the management interface. And the cloud server maps the virtual network path to obtain a physical network path according to a second mapping relation, wherein the second mapping rule comprises a mapping relation from the virtual network path node to the physical network path node. And the cloud server displays the second mapping relation on the management interface.
In the embodiment of the application, the cloud server displays the mapping relation from the control plane path node to the virtual network path node and the mapping relation from the virtual network path node to the physical network path node through the management interface, so that the operation and maintenance personnel are assisted in carrying out the fault analysis of the network link, and the fault diagnosis efficiency of the network link is improved.
In one possible implementation, the cloud server prompts the failure location based on the network path. Specifically, the cloud server can display the position of the fault point in the network path through the management interface, and the network path comprises a control plane path, a virtual network path and a physical network path.
According to the cloud server in the embodiment of the application, the fault position of the network path is improved through the management interface, so that operation and maintenance personnel are assisted to conduct fault analysis of the network link, and the fault diagnosis efficiency of the network link is improved.
In one possible implementation, the cloud server provides at least one fault-clearing recommendation. When the dial testing result indicates that the network link fails, the cloud server marks the abnormal dial testing result on the management interface and prompts fault information, such as the number of faults and fault details of the network link, wherein the fault details comprise fault positions and fault troubleshooting suggestions. And the cloud server generates a network fault diagnosis result and fault information according to the dial testing result, and displays the network fault diagnosis result and the network fault information through the management interface.
According to the cloud server, the fault troubleshooting advice can be determined based on the dial testing result, so that the fault diagnosis efficiency of the network link is improved.
In a possible implementation manner, the virtual network path includes a virtual device between a device corresponding to the source IP address and a device corresponding to the destination IP address, and the physical network path includes a physical device between the device corresponding to the source IP address and the device corresponding to the destination IP address. The cloud server displays the virtual devices in the virtual network path and the physical devices in the physical network path through the management interface.
According to the cloud server, the virtual equipment in the virtual network path and the physical equipment of the physical network path are displayed to the user through the management interface, so that the fault diagnosis efficiency of operation and maintenance personnel on the network link is improved.
In one possible implementation, the dial test result further includes a packet loss rate or a delay of one or more devices. And the cloud server displays the dial testing result through the management interface.
According to the cloud server, the dial testing result is displayed through the management interface, so that the fault diagnosis efficiency of the network link is improved.
In a possible implementation manner, one or more transmission paths corresponding to the network service exist between the source IP address and the destination IP address, and the cloud server can implement dial testing of the overlay network service based on the dial testing task.
According to the cloud server, based on analysis of the static network path, the flow path of any service is obtained, so that dial testing of the superimposed network service is achieved, and fault diagnosis efficiency of a network link is improved.
In a possible implementation manner, in the process of executing a dial testing task based on a network link to obtain a dial testing result of a path node, the cloud server sends the dial testing task to an initial node, the dial testing task is used for indicating the initial node to generate a dial testing message and transmitting the dial testing message to a destination node, the initial node is the path node corresponding to a source IP address, and the destination node is the path node corresponding to the destination IP address. The cloud server acquires mirror image messages of each path node between the starting node and the destination node, wherein the mirror image messages are mirror images of dial-up test messages. And the cloud server analyzes the dial testing result of the path node according to the mirror image message.
According to the cloud server, the dial testing task is executed based on the real dial testing message, the reachability of the network path is detected, the network path dial testing based on the real dial testing message does not need to sense the flow type, decoupling of dial testing flow and service flow type is achieved, and therefore fault diagnosis efficiency of a network link is improved.
In a possible implementation manner, the dial test message includes a dial test task identifier ID and a differentiated services code point DSCP, and in the process that the cloud server obtains the mirror image message of each path node between the starting node and the destination node, the dial test message of each path node between the starting node and the destination node is identified according to the dial test task ID and the DSCP. And the cloud server generates a mirror image message according to the identified dial-up message.
In the embodiment of the application, the real dial-up test message is injected into the starting point node of the network path, the message matching is carried out through the dyeing mark, the matched mirror image message is reported to the fault diagnosis module to detect which network nodes the dial-up test message passes through, so that the reachability of the network path is verified, and the message matching is carried out through the dyeing mark, so that the feasibility of generating the mirror image message is improved.
The second aspect of the embodiment of the application provides a fault diagnosis device of a network link, which comprises a receiving and transmitting unit and a processing unit. The processing unit is used for generating a dial testing task according to dial testing information configured by a user, wherein the dial testing information comprises a source network protocol IP address and a destination IP address. The processing unit is further configured to run a dial testing task, display a dial testing result, where the dial testing result includes at least one network path, the network path includes a device between a device corresponding to a source IP address and a device corresponding to a destination IP address, and the at least one network path includes at least one of: virtual network path, physical network path.
In one possible embodiment, the dial test information further includes one or more of the following: source port, destination port, transport protocol.
In a possible implementation manner, a mapping relationship exists between a device included in the virtual network path and a device included in the physical network path.
In a possible implementation, the processing unit is further configured to prompt the fault location based on the network path.
In a possible embodiment, the processing unit is further adapted to provide at least one fault-clearing recommendation.
In a possible implementation manner, the virtual network path includes a virtual device between a device corresponding to the source IP address and a device corresponding to the destination IP address, and the physical network path includes a physical device between the device corresponding to the source IP address and the device corresponding to the destination IP address.
In one possible implementation, the dial test result further includes a packet loss rate or a delay of one or more devices.
In a possible implementation manner, the transceiver unit is configured to obtain static resource information of the network link, where the static resource information includes one or more of the following information: tenant virtual private cloud VPC, subnet, security group, routing table, port information, IP address and load balancing information. The processing unit is further configured to establish a control plane path of the network link based on the static resource information.
A third aspect of the embodiments of the present application provides a cluster of computer devices, including at least one computing device, each computing device including a processor, where the processor of the at least one computing device is configured to execute instructions stored in a memory of the at least one computing device, so that the cluster of computing devices performs the method according to the first aspect or any one of the possible implementation manners of the first aspect.
A fourth aspect of the embodiments of the present application provides a computer-readable storage medium having stored thereon instructions that, when executed, cause a computer to perform the method of the first aspect or any of the possible implementation manners of the first aspect.
A fifth aspect of the embodiments of the present application provides a computer program product, which includes instructions, wherein the instructions, when executed, cause a computer to implement the method according to the first aspect or any one of the possible implementation manners of the first aspect.
It should be appreciated that any of the above-mentioned advantages achieved by the computer device cluster, the computer-readable medium, or the computer program product may refer to the advantages of the corresponding method, and will not be described herein.
Drawings
Fig. 1 is a schematic system architecture diagram of a fault diagnosis system for a network link according to an embodiment of the present application;
fig. 2 is a flow chart of a fault diagnosis method for a network link according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a management interface according to an embodiment of the present disclosure;
FIG. 4a is a schematic diagram of another management interface according to an embodiment of the present application;
FIG. 4b is a schematic diagram of another management interface according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a physical network path according to an embodiment of the present application;
FIG. 6a is a schematic diagram of another management interface according to an embodiment of the present application;
FIG. 6b is a schematic diagram of another management interface provided by an embodiment of the present application;
fig. 7 is a schematic structural diagram of a fault diagnosis device for a network link according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a computer device cluster according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of another computer device cluster according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a network link fault diagnosis method and device, which are used for improving the fault diagnosis effect of a network link.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
First, some terms related to embodiments of the present application are introduced to facilitate understanding of the solution by those skilled in the art.
Dial test is a network diagnosis mode for simulating a user to send a message to test whether communication can be established between two nodes, thereby verifying whether a network between the two nodes to be dial tested is normal.
The full link refers to a comprehensive one-stop network fault delimiting and positioning system which is constructed by taking topology as a core and combining three layers of mapping of a virtual network control surface, a virtual network data surface and a physical network.
Message dyeing refers to IP message dyeing statistics, which is an IP network performance statistics technology, and the IP message dyeing statistics realizes accurate end-to-end or segmented packet loss measurement of the IP message by directly marking the service message.
The control plane is used for controlling and managing the operation of all network protocols, providing various network information and forwarding lookup table items necessary for the data plane.
The IP quintuple refers to a source IP address, a source port, a destination IP address, a destination port, and a transport layer protocol.
The differentiated services coding point is classified according to quality of service (quality of service, qoS) classification of differentiated services (differentiated service, diff-Serv), and is prioritized by a coding value using 6 bits used and 2 bits unused in a class of service (TOS) byte of each packet IP header.
Virtual networks refer to networks formed by virtual machines running on a single physical machine that are interconnected to send and receive data to and from each other. The virtual machine may be connected to a virtual network created when the network is added.
The following describes a network link failure method and device provided by the embodiments of the present application with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic system architecture diagram of a fault diagnosis system for a network link according to an embodiment of the present application. In the example shown in fig. 1, the network link failure diagnosis system 100 includes a full link display module 101, a failure diagnosis module 102, and a physical device 103. The fault diagnosis module 102 includes a control path restoration submodule 1021, a virtual path restoration submodule 1022, a dial testing task management submodule 1023, a fault diagnosis application program interface API1024, a dial testing result aggregation module 1025 and a dial testing result analysis submodule 1026. Physical devices 103 include one or more devices, such as physical switch 1031, computing node 1032, and gateway 1033. The specific functions of the various parts of the network link failure diagnosis system 100 are described in detail below.
The full link display module 101 is configured to display the fault diagnosis results of the three-layer network link. The three-layer network link includes a control plane path, a virtual network path, and a physical network path. The full link display module 101 is further configured to interact with a user, and receive a fault diagnosis task of a network link created by the user, where the fault diagnosis task includes receiving dial test information of the network link to be diagnosed by the user through a display interface, where the dial test information of the network link includes a source IP address and a destination IP address. In one possible implementation, the dial testing information of the network link further includes; source IP address, source port, destination port, and transport layer protocol.
The fault diagnosis module 102 is configured to generate a dial testing task according to dial testing information input by a user, aggregate and analyze dial testing results of a network link to generate a fault diagnosis result, and generate a fault troubleshooting suggestion corresponding to the fault diagnosis result.
The control path restoration submodule 1021 is used for realizing the control plane path establishment of the network link according to the static resource information of the tenant, and is used for verifying the reachability of the data flow of the tenant in the control plane path under the static resource configuration. The virtual path further comprises an atom module 1022 for generating a virtual path according to the control plane path and providing virtual path information for the dial testing task. The dial testing task management submodule 1023 is used for generating a dial testing task according to dial testing information input by a user and sending the dial testing task to a starting node of a dial testing path. The fault diagnosis API1024 is configured to provide an external application program interface for the fault diagnosis module 102, including receiving fault diagnosis tasks based on the fault diagnosis API1024 and sending dial testing tasks based on the fault diagnosis API 1024.
The dial test result aggregation sub-module 1025 is configured to receive the mirror image message reported by each physical device 103, and generate a dial test result of each physical device 103 according to the reported mirror image message, where the dial test result includes related information in a forwarding physical path of the dial test message, for example, packet loss rate and delay data of each physical device 103. The dial test result aggregation sub-module 1025 is further configured to add a dial test result to the virtual path according to the mirror message of the physical device 103, thereby returning a complete virtual path dial test result.
The dial testing result analysis submodule 1026 is configured to determine whether a network fault exists according to the dial testing result generated by the dial testing result aggregation submodule 1025, and if the network fault exists, send an alarm message to the full-link display module 101 and generate a cause analysis and fault troubleshooting suggestion of the network link fault.
The physical devices 103 are configured to generate a real dial test message according to the dial test task sent by the fault diagnosis module 102, and send the dial test message to the source node, and meanwhile, each physical device 103 in a forwarding path from the source node to the destination node of the dial test message can generate a mirror image message of the dial test message, and send the mirror image message to the fault diagnosis module 102. The physical devices 103 are further configured to dye the dial-up packet before the dial-up packet is sent to the source node, so that each physical device 103 in the forwarding path can identify the dial-up packet.
The full-link display module 101 and the fault diagnosis module 102 in the network link fault diagnosis system 100 may be deployed in one computing device or a computing device cluster formed by a plurality of computing devices, where the computing devices are used to provide cloud services for diagnosing network link faults for users, and the computing devices and the computing device cluster may be collectively referred to as a cloud server.
Referring to fig. 2, fig. 2 is a flow chart of a fault diagnosis method for a network link according to an embodiment of the present application. In the illustration in fig. 2, the fault diagnosis method provided in the embodiment of the present application includes, but is not limited to, the following steps:
201. and the cloud server generates a dial testing task according to dial testing information configured by the user.
The cloud server generates a dial testing task according to dial testing information configured by a user, wherein the dial testing information comprises one or more of the following information: source IP address, destination IP address, source port, destination port, and transport protocol. Specifically, the cloud server provides a management interface for a user, the user can input dial testing information through the management interface provided by the cloud server, a dial testing task is created on the management interface based on the dial testing information, after the cloud server receives the creation message of the dial testing task, the dial testing task is generated according to the dial testing information, and the dial testing task is sent to equipment corresponding to the source IP address.
In the embodiment of the application, one or more transmission paths corresponding to the network service exist between the source IP address and the destination IP address, namely, the cloud server can realize flow dial testing of any overlapped network service. Among them are network services such as virtual private cloud (virtual private cloud, VPC) services and ethernet network processing (ethernet network proc essor, ENP) services.
In a possible implementation manner, when the user creates the dial testing task through the management interface, the name of the dial testing task, the dial testing rate and the dial testing quantity can be set through the management interface, wherein the dial testing rate is the number of data sent per second, and the dial testing quantity is the number of dial testing messages.
Referring to fig. 3, fig. 3 is a schematic diagram of a management interface for creating a dial testing task according to an embodiment of the present application. In the example shown in fig. 3, a user may enter one or more items of dial-up test information including a task name, a protocol type, message details including a source IP address, a source ID, a destination IP address, and a destination ID, a dial-up rate, and a dial-up number through a management interface.
For example, the user may input a task name "task20220914", a selection protocol type ICMP, a source IP address "192.168.10.1", a destination IP address "192.168.10.2", a dial rate input "2PPS", and a dial number input "100" through the management interface.
In the embodiment of the application, a user inputs dial testing information, after a dial testing task is created, a cloud server acquires static resource information of a network link, and a control plane path of the network link is established according to the static resource information. Specifically, the cloud server acquires static resource information of a network link between a source IP address and a destination IP address, where the static resource information includes one or more of the following information: the tenant virtual private cloud VPC, a subnet, a security group, a routing table, port information, an IP address and load balancing information, and the cloud server establishes a control plane path between a source IP address and a destination IP address based on static network resource information. The control plane path refers to a network path for control signaling transmission between a source node and a destination node.
In one possible implementation, the cloud server may establish a control plane path between the source IP address and the destination IP address based on the open source tool Batfish. After the cloud server establishes a control plane path between the source IP address and the destination IP address based on the static network resource information, the control plane path between the source IP address and the destination IP address may also be displayed at the management interface.
After the cloud server establishes a control plane path between a source IP address and a destination IP address based on static network resource information, mapping the control plane path according to a first mapping relation to obtain a virtual network path, wherein the virtual network path is a logical data transmission path established based on network resources, and comprises one or more virtual devices between devices corresponding to the source IP address and devices corresponding to the destination IP address. The first mapping relationship comprises a mapping relationship of a control plane path node to a virtual network path node. After the cloud server obtains the virtual network path, the virtual network path and a first mapping relation are displayed on a management interface, wherein the first mapping relation is used for indicating the mapping relation between equipment included in the control plane path and equipment included in the virtual network path.
Referring to table 1, table 1 is a schematic diagram of a first mapping relationship provided in an embodiment of the present application. As shown in table 1, there is a first mapping relationship between the control plane path node and the virtual network path node, for example, the control plane path node "elastic cloud server" corresponds to the virtual network path node "elastic cloud server", the control plane path node "subnet" corresponds to the virtual network path node "computing node proxy", and the control plane path node "virtual private cloud" corresponds to the virtual network path node "virtual router".
TABLE 1
Control plane path node Virtual network path node
Elastic Cloud Server (ECS) Elastic Cloud Server (ECS)
Subnet (Subnet) Computing Node Agent (CNA)
Virtual Private Cloud (VPC) Virtual router (VROUTER)
After the cloud server obtains a virtual network path between the source IP address and the destination IP address based on the first mapping relation, the virtual network path is mapped to obtain a physical network path according to the second mapping relation, wherein the physical network path is a data transmission path established based on physical equipment connection, and the physical network path comprises one or more physical equipment between equipment corresponding to the source IP address and equipment corresponding to the destination IP address. The second mapping relationship comprises a mapping relationship of a virtual network path node to a physical network path node. And after the cloud server obtains the physical network path, displaying the physical network path and a second mapping relation on the management interface, wherein the second mapping relation is used for indicating that the equipment included in the virtual network path and the equipment included in the physical network path have mapping relation.
Referring to table 2, table 2 is a schematic diagram of a second mapping relationship provided in the embodiment of the present application. As shown in table 2, the virtual network path node and the physical network path node have a second mapping relationship, for example, the virtual network path node "elastic cloud server" corresponds to the physical network path node "elastic cloud server", the virtual network path node "computing node agent" corresponds to the physical network path node "computing node", and the virtual network path node "virtual router" corresponds to the physical network path node "virtual router host".
TABLE 2
Virtual network path node Control plane path node
Elastic Cloud Server (ECS) Elastic Cloud Server (ECS)
Computing Node Agent (CNA) Computing Node (CNA)
Virtual router (VROUTER) Virtual router (VROUTER) host machine
Referring to fig. 4a, fig. 4a is a schematic diagram of a transmission path of a full link according to an embodiment of the present application. In the example shown in fig. 4a, the cloud service provides a management interface that can display the transmission path of the full link of the source IP address to the destination IP address, including a control plane path, a virtual network path, and a physical network path.
In the example shown in fig. 4a, the control plane path is, for example, "ecs1→subnet1→vpc1→vpc2→subnet2→ ecs2", the virtual network path is, for example, "ecs1→compute node 1→vruter 1→compute node 2→ ecs2", and the physical network path is, for example, "ecs1→compute node 1→switch2→vruter 1 host→switch3→switch4→compute node 2→ ecs2".
Referring to fig. 4b, fig. 4b is a schematic diagram illustrating a mapping relationship between transmission paths displayed by a management interface according to an embodiment of the present application. In the illustration shown in fig. 4b, the cloud server provides a management interface capable of controlling a first mapping relationship between the plane path node and the virtual network path node and a second mapping relationship between the virtual network path and the physical network path. For example, when the user clicks the control plane path node "vpc1" through the mouse, the management interface displays the mapping relationship between "vpc1" in the control plane path and "compute node 1" in the virtual network path through the dotted line; when the user clicks the virtual network path node "vpc1" through the mouse, the mapping relationship between "computing node 1" in the virtual network path and "computing node 1" in the physical network path is displayed through the dotted line connection.
202. And the cloud server runs a dial testing task to obtain a dial testing result of the path node.
After a user creates a dial testing task through a management interface, the cloud server runs the dial testing task based on a network link to obtain a dial testing result of the path node. Specifically, the cloud server sends a dial testing task to an initial node, the initial node generates a dial testing message and transmits the dial testing message to a destination node, the initial node is a path node corresponding to a source IP address, and the destination node is a path node corresponding to a destination IP address. The cloud server acquires mirror image messages of each path node between the starting node and the destination node, the mirror image messages are mirror images of dial test messages, and the cloud server analyzes dial test results of the path nodes according to the mirror image messages.
In one possible implementation, the cloud server performs dyeing on the dial-up test message. Specifically, the cloud server adds a dial testing task identifier ID and a differentiated service code point DSCP in a message header of the dial testing message, wherein the dial testing task identifier ID is used for indicating a dial testing task to which the dial testing message belongs, and the differentiated service code point DSCP is used for indicating a service class of the dial testing message. In the process of forwarding the dial test message by a path node in the physical network, identifying the dial test message according to the dial test task ID and the DSCP, generating a mirror image message according to the identified dial test message, and sending the mirror image message to a fault diagnosis module of the cloud server.
And in the process that the cloud server analyzes the measurement results of the path nodes according to the mirror image messages, after receiving the mirror image messages reported by each path node in the physical network path, the cloud server counts the packet loss rate and time delay of each path node based on the mirror image messages reported by each path node, and generates a flow path of the completed physical network according to the mirror image messages reported by each path node.
Referring to fig. 5, fig. 5 is a schematic diagram of a traffic path of a physical network according to an embodiment of the present application. In the example shown in fig. 5, the traffic paths of the physical network are generated according to the mirror image messages reported by the path nodes, and the traffic paths of the physical network are, for example, "VM1→cna1→compute access switch 1→aggregate switch 1→network access switch 1→vruter host 1→network access switch 2→aggregate switch 2→compute access switch 2→cna2→vm2".
In the example shown in fig. 5, the traffic path of the physical network may be a plurality of traffic paths, for example, "VM1→cna1→compute access switch 3→aggregation switch 3→network access switch 3→vruter host 3→network access switch 3→aggregation switch 3→compute access switch 3→cna2→vm2".
203. And the cloud server displays a dial testing result of the path node in the transmission path on the management interface.
The cloud server displays a dial testing result of path nodes in a transmission path in a management interface, wherein the transmission path comprises a control plane path, a virtual network path and a physical network path, and the dial testing result comprises packet loss rate and time delay of one or more path nodes in the transmission path.
In one possible embodiment, the cloud server prompts the fault location based on the network path. Specifically, when the dial testing result indicates that the network link fails, the cloud server marks the abnormal dial testing result in the network path in the management interface and marks the position of the abnormal dial testing result in the network path.
In a possible implementation manner, the cloud server generates a network fault diagnosis result and a network fault troubleshooting suggestion according to the dial test result, and displays the network fault diagnosis result and the network fault troubleshooting suggestion through the management interface, wherein the network fault troubleshooting suggestion provides at least one troubleshooting suggestion.
Referring to fig. 6a, fig. 6a is a schematic diagram of a dial testing result of a path node according to an embodiment of the present application. In the example shown in fig. 6a, the management interface displays the dial test result of the path node in three layers of transmission paths, the three layers including a control plane transmission path, a virtual network path, and a physical network path.
In the example shown in fig. 6a, for example, a user clicks a "vrometer" path node in a virtual network path on a management interface, the management interface displays a measurement result related to the "vrometer" path node, the measurement result, for example, the management interface displays 100 messages in the ingress direction of the "vrometer" path node, 80 messages in the egress direction, and the packet loss rate of the path node can be calculated based on the number of messages in the ingress direction and the number of messages in the egress direction of the "vrometer" path node.
Referring to fig. 6b, fig. 6b is a schematic diagram of a management interface of a network fault diagnosis result according to an embodiment of the present application. In the example shown in fig. 6b, the management interface displays the total number of failures of the three-tier network connection and failure details including failure occurrence locations and failure troubleshooting suggestions.
In the example shown in fig. 6a, for example, the management interface displays that the total number of failures of the network links is 1, the failures occur at the computing nodes 198.147.28.52 of the upstream traffic path of the virtual network, and the troubleshooting proposal is "first step," to check whether the physical switch links of the computing node service network card to the eth1 network of all the vruter-ENATs are normal. And secondly, checking whether the service network cards of all the network nodes where the vRouter-ENAT is located and the service network cards of the computing nodes are normal, whether the packets are lost or not, and whether the optical modules at the two ends are normal or not. And checking whether a tunnel-bearing VLAN of the service port connected with the switch network port is released. Thirdly, checking whether the equivalent route to the vRouter-ENAT node vRouter service VIP configured on the switch is normal or not and whether the NQA state is normal or not. Fourthly, please contact technical support. "
Having described the method for diagnosing the failure of the network link provided by the embodiment of the present application, the device provided by the embodiment of the present application is described below with reference to the accompanying drawings.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a fault diagnosis device according to an embodiment of the present application. The device is used for implementing the steps executed by the cloud server in the above embodiments, and as shown in fig. 7, the fault diagnosis device 700 includes a transceiver unit 701 and a processing unit 702.
The processing unit 702 is configured to generate a dial testing task according to dial testing information configured by a user, where the dial testing information includes a source network protocol IP address and a destination IP address. The processing unit 702 is further configured to run a dial testing task, display a dial testing result, where the dial testing result includes at least one network path, and the network path includes a device between a device corresponding to a source IP address and a device corresponding to a destination IP address, and the at least one network path includes at least one of: virtual network path, physical network path.
In one possible embodiment, the dial test information further includes one or more of the following: source port, destination port, transport protocol.
In a possible implementation manner, a mapping relationship exists between a device included in the virtual network path and a device included in the physical network path.
In a possible implementation, the processing unit 702 is further configured to prompt the fault location based on the network path.
In a possible implementation, the processing unit 702 is further configured to provide at least one fault-clearing suggestion.
In a possible implementation manner, the virtual network path includes a virtual device between a device corresponding to the source IP address and a device corresponding to the destination IP address, and the physical network path includes a physical device between the device corresponding to the source IP address and the device corresponding to the destination IP address.
In one possible implementation, the dial test result further includes a packet loss rate or a delay of one or more devices.
In a possible implementation manner, the transceiver unit 701 is configured to obtain static resource information of a network link, where the static resource information includes one or more of the following information: tenant virtual private cloud VPC, subnet, security group, routing table, port information, IP address and load balancing information. The processing unit 702 is further configured to establish a control plane path of the network link based on the static resource information.
It should be understood that the division of the units in the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated when actually implemented. And the units in the device can be all realized in the form of software calls through the processing element; or can be realized in hardware; it is also possible that part of the units are implemented in the form of software, which is called by the processing element, and part of the units are implemented in the form of hardware. For example, each unit may be a processing element that is set up separately, may be implemented as integrated in a certain chip of the apparatus, or may be stored in a memory in the form of a program, and the functions of the unit may be called and executed by a certain processing element of the apparatus. Furthermore, all or part of these units may be integrated together or may be implemented independently. The processing element described herein may in turn be a processor, which may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each unit above may be implemented by an integrated logic circuit of hardware in a processor element or in the form of software called by a processing element.
It should be noted that, for simplicity of description, the above method embodiments are all described as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, and further, that the embodiments described in the specification belong to preferred embodiments, and that the actions are not necessarily required in the present application.
Other reasonable combinations of steps that can be conceived by those skilled in the art from the foregoing description are also within the scope of the present application. Further, those skilled in the art will also be familiar with the preferred embodiments, and the description of the embodiments is not necessarily a requirement of the present application.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application. As shown in fig. 8, the computing device 800 includes: processor 801, memory 802, communication interface 803, and bus 804, processor 801, memory 802, and communication interface 803 being coupled via a bus (not labeled in the figures). The memory 802 stores instructions that, when executed in the memory 802, the computing device 800 performs the method performed by the cloud server in the method embodiment described above.
Computing device 800 may be one or more integrated circuits configured to implement the above methods, for example: one or more specific integrated circuits (application specific integrated circuit, ASIC), or one or more microprocessors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms. For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).
The processor 801 may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The memory 802 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The memory 802 stores executable program codes, and the processor 801 executes the executable program codes to implement the functions of the foregoing transceiver unit and processing unit, respectively, thereby implementing the foregoing methods. That is, the memory 802 has stored thereon instructions for performing the methods described above.
Communication interface 803 enables communication between computing device 800 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, etc.
The bus 804 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. The bus may be a peripheral component interconnect express (peripheral component interconnect express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (Ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like. The buses may be divided into address buses, data buses, control buses, etc.
Referring to fig. 9, fig. 9 is a schematic diagram of a computing device cluster according to an embodiment of the present application. As shown in fig. 9, the computing device cluster 900 includes at least one computing device 800. The computing device 800 may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, computing device 800 may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 9, the computing device cluster 900 includes at least one computing device 800. The same instructions for performing the above-described fault diagnosis may be stored in the memory 802 in one or more computing devices 800 in the computing device cluster 900.
In some possible implementations, some of the instructions for performing the above-described fault diagnosis may also be stored separately in the memory 802 of one or more computing devices 800 in the computing device cluster 900. In other words, a combination of one or more computing devices 800 may collectively execute instructions for performing the above-described fault diagnostics.
It should be noted that, the memories 802 in different computing devices 800 in the computing device cluster 900 may store different instructions for performing part of the functions of the foregoing apparatuses, respectively. That is, the instructions stored in the memory 802 of the different computing devices 800 may implement the functionality of one or more of the transceiver module and the processing module.
In some possible implementations, one or more computing devices 800 in the computing device cluster 900 may be connected by a network. Wherein the network may be a wide area network or a local area network, etc.
Referring to fig. 10, fig. 10 is a schematic diagram of a network connection of computer devices in a computer cluster according to an embodiment of the present application. As shown in fig. 10, two computing devices 800A and 800B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device.
In one possible implementation, instructions to perform the functions of the transceiver module are stored in memory in computing device 800A. Meanwhile, instructions to perform the functions of the processing module are stored in memory in computing device 800B.
It should be appreciated that the functionality of computing device 800A shown in fig. 10 may also be performed by a plurality of computing devices. Likewise, the functionality of computing device 800B may be performed by multiple computing devices.
In another embodiment of the present application, there is further provided a computer readable storage medium, where computer executable instructions are stored, when executed by a processor of a device, the device performs a method performed by the cloud server in the above method embodiment.
In another embodiment of the present application, there is also provided a computer program product comprising computer-executable instructions stored in a computer-readable storage medium. When the processor of the device executes the computer-executable instructions, the device executes the method executed by the cloud server in the method embodiment described above.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random access memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims (19)

1. A method for diagnosing a failure of a network link, comprising:
generating a dial testing task according to dial testing information configured by a user, wherein the dial testing information comprises a source network protocol IP address and a destination IP address;
running the dial testing task, and displaying a dial testing result, wherein the dial testing result comprises at least one network path, the network path comprises equipment between equipment corresponding to the source IP address and equipment corresponding to the destination IP address, and the at least one network path comprises at least one of the following: virtual network path, physical network path.
2. The method of claim 1, wherein the dial-test information further comprises one or more of: source port, destination port, transport protocol.
3. A method according to claim 1 or 2, characterized in that the devices comprised by the virtual network path and the devices comprised by the physical network path have a mapping relationship.
4. A method according to any one of claims 1 to 3, wherein the method further comprises:
and prompting a fault position based on the network path.
5. The method according to claim 4, wherein the method further comprises:
at least one troubleshooting recommendation is provided.
6. The method of any of claims 1-5, wherein the virtual network path comprises a virtual device between a device corresponding to a source IP address and a device corresponding to the destination IP address, and wherein the physical network path comprises a physical device between a device corresponding to a source IP address and a device corresponding to the destination IP address.
7. The method of any of claims 1-6, wherein the dial test result further comprises a packet loss rate or a delay of one or more devices.
8. The method according to any one of claims 1 to 7, further comprising:
acquiring static resource information of the network link, wherein the static resource information comprises one or more of the following information: tenant virtual private cloud VPC, subnet, security group, routing table, port information, IP address and load balancing information;
and establishing a control plane path of the network link based on the static resource information.
9. A failure diagnosis apparatus of a network link, comprising: a transceiver unit and a processing unit;
the processing unit is used for generating a dial testing task according to dial testing information configured by a user, wherein the dial testing information comprises a source network protocol IP address and a destination IP address;
the processing unit is further configured to run the dial testing task, and display a dial testing result, where the dial testing result includes at least one network path, the network path includes a device between a device corresponding to the source IP address and a device corresponding to the destination IP address, and the at least one network path includes at least one of the following: virtual network path, physical network path.
10. The apparatus of claim 9, wherein the dial-test information further comprises one or more of: source port, destination port, transport protocol.
11. The apparatus according to claim 9 or 10, wherein a mapping relationship exists between devices included in the virtual network path and devices included in the physical network path.
12. The apparatus according to any one of claims 9 to 11, wherein the processing unit is further configured to:
and prompting a fault position based on the network path.
13. The apparatus of claim 12, wherein the processing unit is further configured to:
at least one troubleshooting recommendation is provided.
14. The apparatus according to any one of claims 9 to 13, wherein the virtual network path comprises a virtual device between a device corresponding to a source IP address and a device corresponding to the destination IP address, and wherein the physical network path comprises a physical device between a device corresponding to a source IP address and a device corresponding to the destination IP address.
15. The apparatus of any of claims 9 to 14, wherein the dial test result further comprises a packet loss rate or a delay of one or more devices.
16. The apparatus according to any one of claims 9 to 15, wherein the transceiver unit is configured to obtain static resource information of the network link, the static resource information including one or more of the following information: tenant virtual private cloud VPC, subnet, security group, routing table, port information, IP address and load balancing information;
The processing unit is further configured to establish a control plane path of the network link based on the static resource information.
17. A cluster of computer devices, comprising at least one computing device, each computing device comprising a processor, the processor of the at least one computing device to execute instructions stored in a memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 1-8.
18. A computer readable storage medium having instructions stored thereon, which when executed, cause a computer to perform the method of any of claims 1 to 8.
19. A computer program product comprising instructions which, when executed, cause a computer to carry out the method of any one of claims 1 to 8.
CN202211192269.5A 2022-06-17 2022-09-28 Network link fault diagnosis method and device Pending CN117319256A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/080443 WO2023241122A1 (en) 2022-06-17 2023-03-09 Network link fault diagnosis method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022106909489 2022-06-17
CN202210690948 2022-06-17

Publications (1)

Publication Number Publication Date
CN117319256A true CN117319256A (en) 2023-12-29

Family

ID=89248665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211192269.5A Pending CN117319256A (en) 2022-06-17 2022-09-28 Network link fault diagnosis method and device

Country Status (1)

Country Link
CN (1) CN117319256A (en)

Similar Documents

Publication Publication Date Title
US11570285B2 (en) Packet processing method, network node, and system
US20200403882A1 (en) Network health checker
EP4007214A1 (en) Software-defined network monitoring and fault localization
US8117301B2 (en) Determining connectivity status for unnumbered interfaces of a target network device
WO2016029749A1 (en) Communication failure detection method, device and system
EP4002769A1 (en) System and method for evaluating transmission performance related to network node and related device
WO2021128977A1 (en) Fault diagnosis method and apparatus
US11165672B2 (en) Application performance management integration with network assurance
CN111034123B (en) System, method, and computer readable medium for performing network assurance checks
US20220078106A1 (en) Packet Measurement Method, Device, and System
WO2006028808A2 (en) Method and apparatus for assessing performance and health of an information processing network
US9882784B1 (en) Holistic validation of a network via native communications across a mirrored emulation of the network
EP2586158B1 (en) Apparatus and method for monitoring of connectivity services
EP3624401B1 (en) Systems and methods for non-intrusive network performance monitoring
US10608890B2 (en) Holistic validation of a network via native communications across a mirrored emulation of the network
CN105743687B (en) Method and device for judging node fault
US9154409B2 (en) Method for debugging private VLAN
WO2021027420A1 (en) Method and device used for transmitting data
US20230254244A1 (en) Path determining method and apparatus, and computer storage medium
CN114422399A (en) Fault diagnosis method, device, equipment and storage medium
CN117319256A (en) Network link fault diagnosis method and device
WO2023241122A1 (en) Network link fault diagnosis method and apparatus
CN110545240B (en) Method for establishing label forwarding table and forwarding message based on distributed aggregation system
US20230261979A1 (en) Method, Device, and System for Implementing Service Path Detection
US10917326B1 (en) Methods, systems, and computer readable media for debugging test traffic generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication