WO2020135547A1 - 一种确定转发故障位置的方法和设备 - Google Patents

一种确定转发故障位置的方法和设备 Download PDF

Info

Publication number
WO2020135547A1
WO2020135547A1 PCT/CN2019/128517 CN2019128517W WO2020135547A1 WO 2020135547 A1 WO2020135547 A1 WO 2020135547A1 CN 2019128517 W CN2019128517 W CN 2019128517W WO 2020135547 A1 WO2020135547 A1 WO 2020135547A1
Authority
WO
WIPO (PCT)
Prior art keywords
forwarding table
service
forwarding
controller
lookup
Prior art date
Application number
PCT/CN2019/128517
Other languages
English (en)
French (fr)
Inventor
王仲宇
包德伟
张亮
冯涛
徐志平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19903433.1A priority Critical patent/EP3886364A4/en
Publication of WO2020135547A1 publication Critical patent/WO2020135547A1/zh
Priority to US17/361,733 priority patent/US11902087B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/42Centralised routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/54Organization of routing tables
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/64Routing or path finding of packets in data switching networks using an overlay routing layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering

Definitions

  • the present application relates to the field of communications, and in particular to a method and device for determining the location of a forwarding fault.
  • An object of the embodiments of the present application is to provide a method and device for determining a forwarding fault location.
  • An aspect of the present application provides a method for determining a forwarding fault location, which includes: a controller receives multiple forwarding table statistical results of multiple forwarding tables for a first service from one or more forwarders, wherein the multiple The forwarding table has a query order for the first service.
  • the multiple forwarding tables include a first forwarding table and a second forwarding table adjacent to the first forwarding table in the query order.
  • the plurality of forwarding tables further include a third forwarding table
  • the third forwarding table follows the second forwarding table in the query sequence
  • the method includes: the control
  • the second information is determined by the device, and the second information indicates that the statistical result of the table lookup of the third forwarding table for the first service is the same as or similar to the statistical result of the table lookup of the second forwarding table for the first service
  • the controller determines that the third forwarding table is a forwarding table that has not failed for the first service based on the second information.
  • the multiple forwarding tables further include a fourth forwarding table and a fifth forwarding table
  • the controller receiving multiple table lookup statistics for the first service by the multiple forwarding tables includes: The controller receives query statistics of the fourth forwarding table for the first service and query statistics of the fifth forwarding table for the first service, where the fourth forwarding table is in the query order Precedes the first forwarding table, the fifth forwarding table follows the second forwarding table in the query order, and the controller does not receive the fourth forwarding table and the fifth The query results of the forwarding table between the forwarding tables for the first service; the controller determines third information, and the third information indicates that the table lookup behavior of the fourth forwarding table for the first service is normal And the fifth forwarding table has an abnormal table look-up behavior for the first service; the controller sends the forwarding table between the fourth forwarding table and the fifth forwarding table based on the third information to the The repeater requests to obtain query statistical results of the forwarding table between the fourth forwarding table and the fifth forwarding table for the first service.
  • the controller determining that the second forwarding table is a failed forwarding table based on the first information includes: determining a table lookup statistical result of the first service by a sixth forwarding table, Wherein the type of the sixth forwarding table is the same as the type of the second forwarding table, the sixth forwarding table precedes the second forwarding table, the sixth forwarding table and the The second forwarding table is located in a different forwarder; the controller determines fourth information, and the fourth information indicates that the table lookup statistical result of the sixth forwarding table for the first service and the second forwarding table The statistical results of the table lookup for the first service are different and not similar; the controller determines that the second forwarding table is a failed forwarding table based on the first information and the fourth information.
  • the first forwarding table and the second forwarding table are located in one repeater, or in different repeaters.
  • the controller is a software defined network (SDN) controller
  • the repeater is an SDN repeater
  • the first service is a pure (native) IPv4 service, an IPv4 (IPv4 over GRE) service based on general routing encapsulation GRE, a pure (native) IPv6 service, an IPv6 based on IPv4 (IPv6 over IPv4 ) Services, L3VPN over Segment based routing services, L2VPN over TE services based on traffic engineering, or Ethernet Virtual Local Area Network (EVPN) over virtual scalable LAN VxLAN) business.
  • IPv4 IPv4 over GRE
  • IPv6 IPv6 over IPv4 Services
  • L3VPN over Segment based routing services L2VPN over TE services based on traffic engineering
  • EVPN Ethernet Virtual Local Area Network
  • the first service is an IPv4 service, an IPv6 service, an L3VPN service, an L2VPN service, or an EVPN service.
  • controller including:
  • the receiving unit is configured to receive multiple table lookup statistics of multiple forwarding tables for the first service from one or more forwarders, where the multiple forwarding tables have a query order for the first service, and the multiple The forwarding table includes a first forwarding table and a second forwarding table adjacent to the first forwarding table in the query order, in which the first forwarding table precedes the second forwarding table;
  • the state determining unit is configured to determine the first information based on the statistical result of the table lookup of the first forwarding table for the first service and the statistical result of the table lookup of the second forwarding table for the first service, A message indicates that (1) the table lookup behavior of the first forwarding table for the first service is normal and (2) the table lookup behavior of the second forwarding table for the first service is abnormal;
  • a fault determining unit is used by the controller to determine that the second forwarding table is a forwarding table that fails for the first service based on the first information.
  • the fault determination unit is further configured to: determine second information, the second information indicating a table lookup statistical result of the third forwarding table for the first service and the second The statistical result of the table lookup of the forwarding table for the first service is the same or similar; it is determined that the third forwarding table is a forwarding table that has not failed for the first service based on the second information.
  • the multiple forwarding tables further include a fourth forwarding table and a fifth forwarding table;
  • the receiving unit is specifically configured to: receive query statistics of the fourth forwarding table for the first service Results and query statistics of the fifth forwarding table for the first service, wherein the fourth forwarding table precedes the first forwarding table in the query sequence, and the fifth forwarding table is located in the The second forwarding table is followed in the query sequence, and the controller does not receive the query statistical result of the forwarding table between the fourth forwarding table and the fifth forwarding table for the first service; Determining third information, the third information indicating that the table lookup behavior of the fourth forwarding table for the first service is normal and the table lookup behavior of the fifth forwarding table for the first service is abnormal; based on the The third information requests the forwarder where the forwarding table between the fourth forwarding table and the fifth forwarding table is located to obtain a forwarding table between the fourth forwarding table and the fifth forwarding table.
  • a query result of business is specifically configured to: receive query statistics of the fourth forwarding table for the first
  • the fault determining unit is specifically configured to: determine a table lookup statistical result of the first service by a sixth forwarding table, where the type of the sixth forwarding table and the second forwarding table Of the same type, the sixth forwarding table precedes the second forwarding table in the query sequence, the sixth forwarding table and the second forwarding table are located in different repeaters; determining the fourth information, The fourth information indicates that the statistical result of the table lookup of the first service by the sixth forwarding table is different from the statistical result of the table lookup of the first service by the second forwarding table; A message and a fourth message determine that the second forwarding table is a failed forwarding table.
  • the first forwarding table and the second forwarding table are located in one repeater, or in different repeaters.
  • the controller is a software-defined network SDN controller
  • the repeater is an SDN repeater
  • the first service is a pure (native) IPv4 service, an IPv4 (IPv4 over GRE) service based on general routing encapsulation GRE, a pure (native) IPv6 service, an IPv6 based on IPv4 (IPv6 over IPv4 ) Services, L3VPN over Segment based routing services, L2VPN over TE services based on traffic engineering, or Ethernet Virtual Local Area Network (EVPN) over virtual scalable LAN VxLAN) business.
  • IPv4 IPv4 over GRE
  • IPv6 IPv6 over IPv4 Services
  • L3VPN over Segment based routing services L2VPN over TE services based on traffic engineering
  • EVPN Ethernet Virtual Local Area Network
  • the first service is an IPv4 service, an IPv6 service, an L3VPN service, an L2VPN service, or an EVPN service.
  • the method for determining a fault and the controller can efficiently determine a forwarding table in which a forwarding fault occurs, rather than just locating a forwarding device that has a forwarding fault, so the fault can be more accurately located, which is beneficial to recover the fault more quickly .
  • FIG. 1 is a schematic structural diagram of a network 100 according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method 200 for querying different forwarding tables in the same repeater according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method 300 for querying different forwarding tables in the same repeater according to an embodiment of the present invention
  • FIG. 4 is a flowchart of a method 400 for determining forwarding anomalies in an embodiment of the present invention
  • FIG. 5 is a flowchart of a method 500 for determining forwarding anomalies in an embodiment of the present invention
  • FIG. 6 is a flowchart of a method 600 for a controller to receive multiple table lookup statistical results in an embodiment of the present invention
  • FIG. 7 is a flowchart of a method 700 for determining a failed forwarding table in an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a controller 800 in an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a controller 900 in an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a network structure in an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a network structure of a network 100.
  • the network 100 includes a controller 110 and repeaters 120 (120a-120d). Each repeater is connected to the controller 110 and controlled by the controller 110. In this embodiment, the number of repeaters may vary, and the network 100 may include more or fewer repeaters.
  • the repeater 120a and the repeater 120d may be provider edge (Provider Edge, PE) devices, and the repeater 120b and the repeater 120c may be provider (Provider, P) devices.
  • PE Provide Edge
  • P Provide
  • the repeater 120b and the repeater 120c There may also be one or more P devices between the repeater 120b and the repeater 120c, these P devices are located inside the operator's network, and the repeater 120a and the repeater 120d are located at the border of the operator's network, and are used to communicate with the user network or Another operator's network is connected.
  • the repeaters 120a-120d may also be a repeater in an enterprise network, or a repeater in a data center within an enterprise or within an operator.
  • the transponders 120a and 120d do not have to be transponders located at the edge of the network, but may be located inside the network.
  • the controller 110 may be a Software Defined Network (SDN) controller, and the repeaters 120a-120d may be SDN repeaters; the controller 110 may also be a network management device in a traditional Internet Protocol (IP) network, the repeater 120a -120d can be a switch or router in a traditional IP network.
  • SDN Software Defined Network
  • IP Internet Protocol
  • the same repeater may need to query different forwarding tables for the message, and different repeaters may also need to query different forwarding tables for the same message, so as to achieve the forwarding of the message.
  • FIG. 2 is a flowchart of a method 200 for querying different forwarding tables in the same repeater according to an embodiment of the present invention.
  • a repeater such as repeater 120a queries a forwarding information base (Forwarding Information Base (FIB) 6260) based on the virtual router identification (Virtual Router Identification, VRID) carried in the received message and the destination IP address of the message.
  • FIB Forwarding Information Base
  • VRID Virtual Router Identification
  • an index of routing entries is obtained, in which the destination IP address of the packet is an IPv6 address.
  • the forwarder determines the routing table entry RE6 262, and further queries the RE6 262 using the destination IP address of the message to obtain the next hop IP address of the message and a tunnel index.
  • the forwarder determines the routing table entry RE4267 and further queries the routing table entry RE4267 using the destination IP address of the tunnel to obtain the outgoing port information of the packet and the IP address of the next hop.
  • Port information includes port number or TRUNK ID.
  • the Traffic Management (TM) module sends the message from the inbound interface board to the outbound interface board.
  • the repeater has independent input interface boards and output interface boards.
  • a packet enters the transponder through the ingress interface board, and then exits the transponder through the egress interface board. 202-206 is performed by the inbound interface board, while 212 and subsequent operations are performed by the outbound interface board.
  • an interface board includes both the ability to receive messages and the ability to send messages. In this case, there may be only one interface board in the transponder, but there may also be multiple interface boards. When such an interface board with receiving and sending capabilities receives a message, the TM module does not need or cannot send the message from the interface board to another interface board, but the interface board completes the operation of sending the message.
  • the tunnel encapsulation method may be IPv6 based on IPv4 (IPv6 over IPv4).
  • the link layer encapsulation table based on the outgoing port information and the next hop IP address obtained in 208 and the OVID obtained in 212, and hit the link layer encapsulation entry 270 to obtain the destination media access control (Media Access Control, MAC) Address, and the corresponding link layer encapsulation based on the destination MAC address and other relevant information, in order to transmit IPv6 over IPv4 format tunnel messages in the tunnel.
  • the outer destination IP address of the tunnel packet is an IPv4 format address, that is, an IPv4 address.
  • FIG. 3 is a flowchart of a method 300 for querying different forwarding entries in the same repeater according to an embodiment of the present invention.
  • the content shown in FIG. 3 is a continuation of the content shown in FIG. 2. According to the table lookup sequence, the transponder performing the method shown in FIG. 3 is after the transponder performing the method shown in FIG. 2, or it can be said that the transponder performing the method shown in FIG. 3 is located downstream of the transponder performing the method shown in FIG. .
  • a repeater such as repeater 120b, receives a tunnel message in IPv6 over IPv4 format from a repeater that performs 214, such as repeater 120a, and queries it based on the VRID carried in the tunnel message and the destination IP address in IPv4 format.
  • the FIB on the interface board hits FIB entry FIB4 360, thereby obtaining a RE index.
  • the repeater determines the corresponding RE entry RE4 362 according to the RE index, and queries the RE 632 using the destination IP address of the tunnel packet to obtain the next hop IP address and the outbound port information of the tunnel packet.
  • the outgoing port information is the outgoing port number or TRUNK ID.
  • the repeater queries the address resolution protocol (Address) Protocol (ARP) table based on the next hop IP address of the tunnel packet, the outgoing port information, and the OVID, and hits the ARP entry ARP364, thereby obtaining a destination MAC address. In this way, the repeater can continue to forward the tunnel packet using the destination MAC.
  • Address Address
  • ARP address resolution protocol
  • flowchart 200 and flowchart 300 describe how two adjacent repeaters perform multiple table lookup operations, and after successful table lookup, implement forwarding of a tunnel packet.
  • the forwarder can forward packets belonging to different services, and tunnel packets are only one of them.
  • a repeater may need to perform multiple table lookup operations.
  • the message will be forwarded to the next transponder according to the table lookup result used to indicate the next hop in the multiple table lookup operations.
  • the forwarded packets may undergo some processing, such as encapsulation, decapsulation, and security detection.
  • each repeater in order to be able to forward the message, each repeater needs to query its internal entries in sequence. Therefore, for a message, each entry in a transponder can be understood as having a table lookup sequence. For example, for IPv6 over IPv4 tunnel messages in this embodiment, it can be considered that the forwarding entries 260, 262, 264, 266, 268, and 270 are arranged in the order of table lookup.
  • FIB6 260 is the first of these forwarding entries
  • the link layer encapsulation table 270 is the last of these forwarding entries
  • each entry has its neighbors along the table lookup order Table entries such as FIB260 and RE6 262 are adjacent, RE6 262 and TNL264 are also adjacent to FIB260.
  • FIG. 4 is a flowchart of a method 400 for determining forwarding anomalies in an embodiment of the present invention.
  • the repeater 120 receives the first service message, queries multiple forwarding tables inside the repeater 120, processes the multiple messages according to the result of querying the multiple forwarding tables, and processes the processed Multiple messages are sent to the network.
  • the multiple forwarding tables may refer to the FIB table where the forwarding entry FIB6 260 is located, the RE6 table where the forwarding entry RE6 262 is located, the TNL table where the forwarding entry TNL 264 is located, and Part or all of the FIB4 table where the FIB4 266 is published, the TNL table where the forwarding table entry TNL268 is located, and the link layer encapsulation table where the link layer encapsulation entry 270 as the forwarding entry is located.
  • the "plurality" refers to two or more.
  • the forwarder 120 determines the multiple table lookup statistical results of the multiple forwarding tables for the first service with respect to the packet of the first service.
  • a table lookup statistical result may indicate that the repeater 460 queries the success number, success rate, failure number, or failure rate of a forwarding table based on the packet of the first service received within a time period.
  • the repeater 120a receives 100 first service packets within 5 seconds, and the repeater 120a uses these 100 packets to perform 100 queries on the FIB where the FIB260 is located, 90 of which are obtained based on the FIB260 Search results.
  • the FIB forwarding table table lookup statistical result for the first service may be 90 times of success, 90% of success rate, 10 times of failure, or 10% of failure rate.
  • Some or all of the repeaters 120 send multiple table lookup statistics of the multiple forwarding tables to the controller 110.
  • the repeater can compare the table lookup statistics of FIB6 260, the table lookup statistics of RE6 262, the table lookup statistics of TNL264, the table lookup statistics of FIB4266, and the table lookup of TNL268 Part or all of the table statistical results and the table lookup statistical results of the link layer encapsulation table entry 270 are sent to the controller 110.
  • the FIB6 260 table lookup statistical result is a brief expression of the table lookup statistical result of the forwarding table where the FIB6 260 is located. This brief expression is also applicable to other table lookup statistical results.
  • the controller 110 determines the faulty forwarding table according to the received statistical results of the multiple table lookups.
  • the contents of the statistical results of the multiple table lookups received by the controller 110 may have various situations.
  • the multiple table lookup statistical results include table lookup statistical results for two adjacent tables for the service, such as the FIB table where FIB6 260 is located and the RE table where RE6 262 is located. And, based on the statistical results of the two table lookups, the controller 110 can determine the table to be looked up first in the two adjacent tables, such as the FIB table where FIB6260 is located.
  • the table lookup statistical result is normal, and the table to be looked up later, For example, the RE table where RE6262 is located, the statistical result of the table lookup is abnormal.
  • the two adjacent tables may be inside the same transponder, or may belong to two adjacent transponders respectively. For example, when the table to be checked first refers to the FIB360 shown in FIG. 3, the table to be checked later is not in the same transponder as the FIB360, but in two adjacent transponders.
  • the normality or abnormality of the table lookup statistics of a forwarding table can be obtained by comparing the success times, success rates, failure times or failure rates of querying the forwarding table with a threshold. This comparison can be done by the controller or the repeater. If it is done by the transponder, the controller can directly know from the transponder whether the statistical result of the table lookup of a forwarding table is normal. In addition, the normality or abnormality of the statistical results of the look-up table of a forwarding table may also be a relative concept. The controller may determine whether the statistical result of the table lookup of the subsequent forwarding table is worse than the statistical result of the table lookup of the previous adjacent forwarding table by a threshold.
  • the threshold it can be considered that the table lookup behavior of the latter forwarding table for packets is abnormal, and the table lookup behavior of the previous forwarding table for packets of the same service is normal.
  • the success rate of the FIB6 260 table lookup at the front is 90%
  • the success rate of the lookup table of the next RE6 262 is 60%
  • the difference in success rate is 30%.
  • the threshold is 20%, then it can be considered that: the table lookup behavior related to RE6262 is abnormal, that is, the table lookup behavior related to FIB6260 is abnormal; the table lookup behavior related to FIB6260 is normal, that is, relative to RE6262 related
  • the table lookup behavior is normal.
  • the controller 110 determines that the forwarding table checked later is a failed forwarding table.
  • the plurality of table lookup statistical results received by the controller 110 include table lookup statistical results for two non-adjacent tables of the service, wherein none of the plurality of table lookup statistical results are in these two The table lookup statistics of the tables between non-adjacent tables.
  • the table lookup statistics of the tables that are first checked in these two non-adjacent tables are normal, and the table lookup statistics of the tables that are checked later are abnormal.
  • the two non-adjacent tables may be located on the same interface board, or may be located on different interface boards, or may be located on different transponders.
  • the controller 110 receives the table lookup statistical result of the RE table where RE6 262 is located and the table lookup statistical result of the RE table where RE4 267 is located, where the RE table where RE6 262 is located and the RE table where RE4 267 is located on the same interface board on.
  • the controller 110 has not received the table lookup result of the TNL table where the TNL264 is located, nor the table lookup statistical result of the FIB table where the FIB4 266 is located.
  • the controller 110 may also receive the FIB table statistical results of the FIB6 260 and the table lookup statistical results of the table after RE4267, such as the table lookup statistical results of the TNL268 and the LE270 Table lookup statistics of the LE table.
  • the controller 110 notifies one or two repeaters containing the two tables to send to the controller 110 one or more intermediate tables between the two tables for forwarding the service Table lookup statistics.
  • the transponder can upload according to the notification of the controller 110;
  • the two tables belong to two transponders, the two adjacent transponders and the One or more transponders storing the one or more intermediate tables in one or more transponders among the two transponders may drop the controller 110 to send the one or more intermediate tables according to the notification of the controller 110 Table lookup statistics for the business;
  • the controller 110 determines an abnormal table lookup action based on the received table lookup statistics of the one or more intermediate tables.
  • the specific method may include: determining two adjacent tables from the two non-adjacent tables and the one or more intermediate tables, wherein, among the two adjacent tables, the The statistical result of the table lookup is normal, and the table lookup result of the forwarding table that is checked later is abnormal; the forwarding table that is checked after being determined is the faulty forwarding table.
  • the two non-adjacent tables are the RE table where RE6 260 is located and the RE table where RE4 267 is located
  • one or more intermediate tables are the TNL table where TNL264 is located and the FIB table where FIB4 266 is located.
  • the controller 110 may consider this to be a determined result, or it may be considered to be a possible result, and the possible The results are further verified. When the controller 110 considers this to be a possible result, the controller 110 can verify the possible result.
  • the controller 110 determines that a forwarding table on one transponder may fail for a service, it may refer to the statistical result of the table lookup of the forwarding table and the statistical result of the table lookup of the same type of forwarding table on another transponder for the same service For comparison, for the service, the forwarding table of the same type on the other repeater precedes the forwarding table of the possible transmission failure in the table lookup order.
  • the controller 110 determines that the RE table where RE4362 is located may have failed for a service. At this time, the controller 110 may compare the table lookup statistics of the RE table where RE4362 is located for the service and the forwarding table of the same type that is in a previous position in the table lookup order, that is, the forwarding table where RE4267 is located. Look up table statistics and compare. When the statistical result of the table lookup of the forwarding of the same type is normal, the controller 110 can more accurately determine that the RE table where RE4362 is located has failed for the service.
  • the controller 110 may determine the forwarding table of the fault according to the statistical results of the table lookups of multiple forwarding tables reported by the forwarder for the same service, so the fault can be accurately located to facilitate subsequent troubleshooting. Treatment, such as troubleshooting.
  • FIG. 5 is a flowchart of a method 500 for determining forwarding anomalies in an embodiment of the present invention.
  • the controller receives multiple table lookup statistics for the first service in multiple forwarding tables, where the multiple forwarding tables have a query order for the first service, and the multiple forwarding tables include the first forwarding table and A second forwarding table adjacent to the first forwarding table in the query order, in which the first forwarding table precedes the second forwarding table.
  • the controller may be the controller 110 shown in FIG. 1 or FIG. 4.
  • the foregoing embodiment has introduced that the forwarding table in one or more transponders has a query order for a service in conjunction with FIGS. 2 and 3.
  • the one or more repeaters may refer to one or more of the repeaters 120 shown in FIG. 1 or FIG. 4.
  • the first forwarding table may be the FIB table where FIB6 260 is located
  • the second forwarding table may be the RE table where RE6 262 is located
  • the first forwarding table may be the RE table where RE4 267 is located
  • the second forwarding table may be the TNL table where TNL268 is located
  • the first forwarding table may be the LE table where LE270 is located
  • the second forwarding table may be the FIB table where FIB4360 is located.
  • the multiple forwarding tables may refer to all or part of the forwarding tables corresponding to the entries 260-270 shown in FIG. 2, may be all or part of the forwarding tables corresponding to the entries 360-364 shown in FIG. 3, or may be Refers to all or part of the forwarding tables corresponding to the entries 260-268 and 360-364 in FIGS. 2 and 3.
  • the controller may receive multiple table lookup statistics of the multiple forwarding tables for the first service in various ways.
  • the controller may obtain multiple table lookup statistical results of the multiple forwarding tables for the first service based on a request or without sending the request.
  • the controller may first obtain one or more table lookup statistics for the first service in a part of the plurality of forwarding tables, and then, if necessary, based on the data sent to the one or more forwarders Request to obtain statistical results of one or more table lookups for the first service for some or all of the remaining forwarding tables in the multiple forwarding tables.
  • the method shown in FIG. 6 is a specific example of receiving the statistical results of multiple table lookups for the first service by the multiple forwarding tables based on the above ideas, and the detailed content in FIG. 6 will be introduced in the subsequent content of this application .
  • the method 500 executes 510.
  • the controller determines first information based on the statistical result of the table lookup of the first forwarding table for the first service and the statistical result of the table lookup of the second forwarding table for the first service.
  • a piece of information indicates that the table lookup behavior of the first forwarding table for the first service is normal and the table lookup behavior of the second forwarding table for the first service is abnormal.
  • the foregoing embodiments of the present invention have introduced the normal or abnormal table lookup behavior of a forwarding table for a service.
  • the normality or abnormality of the statistical results of a forwarding table lookup table for a service can be obtained by comparing the success times, success rates, failure times or failure rates of querying the forwarding table for the service with a threshold, or by other similar Operational realization. This comparison can be done by the controller or the repeater. If it is done by the transponder, the controller can directly know from the transponder whether the statistical result of the table lookup of a forwarding table is normal.
  • the normality or abnormality of the statistical results of the look-up table of a forwarding table may also be a relative concept.
  • the controller may determine whether the statistical result of the table lookup of the subsequent forwarding table is worse than the statistical result of the table lookup of the previous adjacent forwarding table by a threshold. If the threshold is reached, it can be considered that the table lookup behavior of the latter forwarding table for packets is abnormal, and the table lookup behavior of the previous forwarding table for packets of the same service is normal. For example, the success rate of the FIB6 260 table lookup at the front is 90%, and the success rate of the lookup table of the next RE6 262 is 60%, then the difference in success rate is 30%.
  • the threshold is 20%, then it can be considered that: the table lookup behavior related to RE6262 is abnormal, that is, the table lookup behavior related to FIB6260 is abnormal; the table lookup behavior related to FIB6260 is normal, that is, relative to RE6262 related The table lookup behavior is normal.
  • the first service in this embodiment may be divided according to different granularities.
  • the first service may be a pure IPv4 (native IPv4) service, an IPv4 (IPv4 over GRE) service based on general route encapsulation GRE, a pure IPv6 (native IPv6) ) Business, IPv6-based IPv6 (IPv6 over IPv4) business, segmented routing-based three-layer virtual private network (L3VPN over Segment) Routing service, traffic engineering-based two-layer virtual private network (L2VPN over TE) business, based on virtual Ethernet Virtual Local Area Network (EVPN) over VxLAN service or other similar services that can extend the local area network.
  • the first service may be an IPv4 service, an IPv6 service, an L3VPN service, an L2VPN service, or an EVPN service.
  • the controller determines that the second forwarding table is a forwarding table that fails for the first service based on the first information.
  • the first forwarding table of the first location has a normal table lookup behavior for the first service and the neighboring forwarding table of the rear location, that is, the second forwarding table, for the first service Has abnormal table lookup behavior. Since the table lookup behavior of the first forwarding table for the first service is normal, the first forwarding table has not failed for the first service, and the second forwarding table has failed for the first service Forwarding table.
  • the controller may further verify whether the second forwarding table is a failed forwarding table, and after verifying that the second forwarding table is indeed a failed After the forwarding table, it is finally determined that the second forwarding table is a failed forwarding table.
  • This verification process can also be regarded as a part of the operations in which the controller determines that the second forwarding table is a forwarding table that fails for the first service based on the first information.
  • the details of verification in this part are introduced by the embodiment corresponding to FIG. 7.
  • the embodiment shown in FIG. 5 may further include optional operations 520 and 525.
  • the controller determines second information, and the second information indicates that a statistical lookup result of the third forwarding table in the plurality of forwarding tables for the first service and the second forwarding table for the first service
  • the statistical results of the table lookup of the first service are the same or similar, and the third forwarding table follows the second forwarding table in the query sequence.
  • the third forwarding table is where the TNL264 is located TNL table.
  • the second forwarding table is determined to be a forwarding table that fails for the first service, and the table lookup statistics of the third forwarding table for the first service and the second forwarding table for the first service
  • the statistical results of the table lookup are the same or similar, it means that the abnormal table lookup result of the third forwarding table for the first service is caused by the failure of the second forwarding table, and the third forwarding table appears for the first service
  • the table lookup behavior itself is normal. Similar here means that the difference between the two comparison objects is less than a set standard, which can be determined by the network manager based on actual experience.
  • the controller determines that the third forwarding table is a forwarding table that has not failed for the first service based on the second information.
  • FIG. 6 is a flowchart of a method 600 for a controller to receive multiple table lookup statistics in an embodiment of the present invention. This method is a specific example of implementing 505. The method shown in FIG. 6 may be based on 505-515 or 505-525.
  • the controller receives a query result of the fourth forwarding table among the multiple forwarding tables for the first service and a query of the fifth forwarding table among the multiple forwarding tables for the first service. Statistical results, wherein the fourth forwarding table precedes the first forwarding table in the query order, the fifth forwarding table follows the second forwarding table in the query order, and the The controller does not receive the query statistical result of the forwarding table between the fourth forwarding table and the fifth forwarding table for the first service.
  • the first forwarding table and the second forwarding table are the two forwarding tables where TNL264 and FIB4266 are located, respectively.
  • the fourth forwarding table may be a forwarding table where RE6 262 is located or a forwarding table where FIB6 260 is located
  • the fifth forwarding table may be a forwarding table where any one of forwarding entries 267-270 is located , Or the forwarding table where any one of the forwarding entries 267-364 is located.
  • the controller determines third information.
  • the third information indicates that the table lookup behavior of the fourth forwarding table for the first service is normal and the table lookup of the fifth forwarding table for the first service is normal. Abnormal behavior.
  • the controller requests the forwarder where the forwarding table between the fourth forwarding table and the fifth forwarding table is located based on the third information to obtain the fourth forwarding table and the fifth forwarding table.
  • the forwarding table between the query statistics of the first service is
  • the controller may request the forwarder for the information between the fourth forwarding table and the fifth forwarding table. Forwarding table.
  • the controller needs to request the fourth forwarding table and the fifth transfer from the different forwarder Forwarding table between publications.
  • FIG. 7 is a flowchart of a method 700 for determining a faulty forwarding table in an embodiment of the present invention. This method is a specific implementation of 515.
  • the controller determines a table lookup result of the first service by the sixth forwarding table, where the type of the sixth forwarding table is the same as the type of the second forwarding table, and the sixth forwarding table is at The query sequence precedes the second forwarding table, and the sixth forwarding table and the second forwarding table are located in different repeaters.
  • the second forwarding table is the RE forwarding table where RE4 362 in a repeater shown in FIG. 3, then the sixth forwarding table may be another one located before the RE4 forwarding table
  • the forwarding table of RE4 in a repeater for example, the RE4 forwarding table where RE4 267 shown in FIG. 2 is located.
  • Two forwarding tables of the same type may refer to forwarding tables having the same structure or the same field correspondence.
  • the field correspondence refers to the correspondence between two fields. For example, when one field is an IP address and the other field is a MAC address, the correspondence between the fields based on these two fields is that the IP address corresponds to the MAC address.
  • the controller determines fourth information, where the fourth information indicates the statistical result of the table lookup of the first service by the sixth forwarding table and the table lookup of the first service by the second forwarding table
  • the statistical results are different and not similar. Not similar means that the difference in the statistical results of the two table lookups exceeds a set threshold.
  • the statistical result of the sixth forwarding table's table lookup for the first service is a table lookup success rate of 90%
  • the second forwarding table's table lookup statistical result for the first service to which it belongs is a table lookup success rate of 70 %.
  • these two table lookup statistics are different. Suppose that in this example, 10% more or less than a table lookup statistical result is considered to be similar to this table lookup statistical result.
  • the statistical result of the table lookup of the target forwarding table and the table lookup of the sixth forwarding table is similar, but in this example, the statistical result of the table lookup of the second forwarding table is that the table lookup success rate is 70%, not between 80% and 100%, so it is considered that the two forwarding tables check the first service Table statistics are not similar.
  • the controller determines that the second forwarding table is a failed forwarding table based on the first information and the fourth information.
  • the second forwarding table can be determined more accurately as a failed forwarding table.
  • FIG. 8 is a schematic structural diagram of a controller 800 in an embodiment of the present invention.
  • the controller 800 may be the controller 110 in the above-described embodiment, and can perform all operations performed by the controller 110 in the above-described embodiment based on a plurality of units inside thereof.
  • the controller 800 may perform all operations performed by the controller in the methods shown in FIGS. 1-7 based on multiple units inside it.
  • the controller 800 includes a receiving unit 805, a state determining unit 810, and a fault determining unit 815.
  • the receiving unit 805, the status determining unit 810, and the fault determining unit 815 may be three independent hardware units, or may be three software units.
  • the receiving unit 805 is configured to receive multiple table lookup statistics of multiple forwarding tables for the first service from one or more forwarders, where the multiple forwarding tables have a query order for the first service, the
  • the multiple forwarding tables include a first forwarding table and a second forwarding table adjacent to the first forwarding table. In the query sequence, the first forwarding table precedes the second forwarding table.
  • the state determining unit 810 is configured to determine the first information based on the statistical result of the table lookup of the first forwarding table for the first service and the statistical result of the table lookup of the second forwarding table for the first service.
  • the first information indicates that the table lookup behavior of the first forwarding table for the first service is normal and the table lookup behavior of the second forwarding table for the first service is abnormal.
  • the fault determination unit 815 is used by the controller to determine that the second forwarding table is a forwarding table that fails for the first service based on the first information.
  • the fault determination unit 815 may be further configured to: determine second information, the second information indicating that the statistical result of the table lookup of the third forwarding table for the first service and the second forwarding table for the first The table lookup statistics of a service are the same or similar, and the third forwarding table follows the second forwarding table in the query sequence; based on the second information, it is determined that the third forwarding table is for the first One service is a forwarding table that has not failed.
  • the receiving unit 805 may be specifically configured to receive query statistical results of the fourth forwarding table for the first service and query statistical results of the fifth forwarding table for the first service, where the fourth forwarding table is located in Prior to the first forwarding table in the query sequence, the fifth forwarding table follows the second forwarding table in the query sequence, and the controller does not receive the fourth forwarding table and Query statistical results of the forwarding tables between the fifth forwarding tables for the first service; determining third information, the third information indicating that the table lookup behavior of the fourth forwarding table for the first service is normal And the fifth forwarding table has an abnormal table lookup behavior for the first service; based on the third information, it requests one or more forwarders to obtain the information between the fourth forwarding table and the fifth forwarding table.
  • the query statistics of the forwarding table for the first service is specifically configured to receive query statistical results of the fourth forwarding table for the first service and query statistical results of the fifth forwarding table for the first service, where the fourth forwarding table is located in Prior to the first forwarding table in the query sequence, the
  • the fault determining unit 815 may be specifically configured to: determine a statistical result of a table lookup of the first service by a sixth forwarding table, where the type of the sixth forwarding table is the same as the type of the second forwarding table, the The sixth forwarding table precedes the second forwarding table in the query sequence, the sixth forwarding table and the second forwarding table are located in different repeaters; the fourth information is determined, and the fourth information indicates The statistical result of the table lookup of the first service by the sixth forwarding table is different from the statistical result of the table lookup of the first service by the second forwarding table; based on the first information and the fourth information It is determined that the second forwarding table is a failed forwarding table.
  • the first forwarding table and the second forwarding table are located in one repeater, or in different repeaters.
  • the controller is a software-defined network SDN controller, and the repeater is an SDN repeater.
  • the first service is a pure (native) IPv4 service, a general routing encapsulation (GRE) based IPv4 (IPv4 over GRE) service, a pure (native) IPv6 service, an IPv4-based IPv6 (IPv6 over IPv4) service, and a segment-based Routed Layer 3 Virtual Private Network (L3VPN over Segment) Routing service, Traffic Engineering-based Layer 2 Virtual Private Network (L2VPN over TE) service or Ethernet Virtual Local Area Network (EVPN over VxLAN) service based on virtual scalable LAN.
  • the first service may also be an IPv4 service, an IPv6 service, an L3VPN service, an L2VPN service, or an EVPN service.
  • the controller 900 includes a processor 910, a memory 920 in communication with the processor 910, and a transceiver 930.
  • the processor 910 may include one or more central processing units (CPUs), one or more network processors (NPs), one or more application-specific integrated circuits (ASICs) , One or more programmable logic devices (programmable logic devices (PLD) or a combination of some or all of the above types of devices with processing capabilities.
  • the above PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field programmable logic gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • the memory 920 may refer to one memory, or may include multiple memories.
  • the memory 920 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory may also include non-volatile memory (non-volatile memory), such as read-only memory (read-memory) only memory (ROM), flash memory (flash memory), hard disk (hard disk drive) or solid-state drive (SSD); the memory may also include a combination of the above types of memory.
  • the memory 920 stores computer readable instructions including an operating system 922 run by the controller 900 and multiple software units for implementing the controller 900, such as a receiving unit 924, a status determining unit 926, and a fault determination Unit 928.
  • the transceiver 930 may refer to two interface boards, or may refer to different ports on the same interface board.
  • the processor 910 executes the computer-readable instructions in the receiving unit 924, the state determining unit 926, and the fault determining unit 928 based on the operating system 922, respectively, the processor 910 may execute, or cause the controller 900 to execute, the receiving in the processor 800 Functions and operations performed by the unit 805, the state determination unit 810, and the failure determination unit 815.
  • the processor 910 may perform corresponding operations according to the instructions of each software unit. After the processor 910 executes the computer-readable instructions in the memory 920, it may execute according to the instructions of the computer-readable instructions, or cause the controller 900 to execute.
  • the controller in the above content in this application for example, the controller 110, All operations performed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明实施例提供了一种确定转发故障位置的方法和设备。依照该方法,控制器从一个或多个转发器接收多个转发表针对第一业务的多个查表统计结果,所述多个转发表包括第一转发表和第二转发表,其中在查询顺序中第一转发表先于第二转发表。之后,所述控制器基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定所述第一转发表针对所述第一业务的查表行为正常且所述第二转发表针对所述第一业务的查表行为异常。基于确定的结果,所述控制器确定所述第二转发表为针对所述第一业务发生故障的转发表。通过这种方法,控制器可以高效地确定发送故障的转发表,而不仅是定位发生故障的转发器。

Description

一种确定转发故障位置的方法和设备
本申请要求于2018年12月29日提交中国国家知识产权局、申请号为CN201811634089.1、发明名称为“一种确定转发故障位置的方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及一种确定转发故障位置的方法和设备。
背景技术
在网络通信的过程中,一旦数据传输发生转发故障,网络的管理者都希望能够尽快定位并排除故障,是网络恢复到正常的状态。在将网络从故障状态恢复到正常状态的过程中,定位故障是耗时最长的。因此,为了快速恢复发生转发故障的网络,就需要尽快定位转发故障。在网络中经常出现因报文在多台转发设备组成的网络中转发失败而导致的转发故障。由于网络的组网复杂、转发报文的过程涉及多台转发设备,因此定位转发故障的难度很快,需要耗费较长的时间和较多人力,对进行故障定位的人员的技能要求很高。
发明内容
本申请实施例的一个目的为提供对一种确定转发故障位置的方法和设备。
本申请的一方面提供了一种确定转发故障位置的方法,该包括:控制器从一个或多个转发器接收多个转发表针对第一业务的多个查表统计结果,其中所述多个转发表针对所述第一业务存在查询顺序,所述多个转发表包括第一转发表和与第一转发表在所述查询顺序上相邻的第二转发表,在所述查询顺序中第一转发表先于第二转发表;所述控制器基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定第一信息,所述第一信息表明(1)所述第一转发表针对所述第一业务的查表行为正常且(2)所述第二转发表针对所述第一业务的查表行为异常;所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表。
在一种可能的设计中,所述多个转发表还包括第三转发表,所述第三转发表在所述查询顺序中后于所述第二转发表,所述方法包括:所述控制器确定第二信息,所述第二信息表明所述第三转发表针对所述第一业务的查表统计结果与所述第二转发表针对所述第一业务的查表统计结果相同或类似;所述控制器基于所述第二信息确定所述第三转发表针对所述第一业务是未发生故障的转发表。
在一种可能的设计中,所述多个转发表还包括第四转发表和第五转发表,所述控制 器接收多个转发表针对第一业务的多个查表统计结果包括:所述控制器接收所述第四转发表针对所述第一业务的查询统计结果和所述第五转发表针对所述第一业务的查询统计结果,其中所述第四转发表在所述查询顺序中先于所述第一转发表,所述第五转发表在所述查询顺序中后于所述第二转发表,并且,所述控制器没有接收到所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果;所述控制器确定第三信息,所述第三信息表明所述第四转发表针对所述第一业务的查表行为正常且所述第五转发表针对所述第一业务的查表行为异常;所述控制器基于所述第三信息向所述第四转发表和所述第五转发表之间的转发表所在的转发器请求获得所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
在一种可能的设计中,所述控制器基于所述第一信息确定所述第二转发表为发生故障的转发表包括:确定第六转发表对所述第一业务的查表统计结果,其中所述第六转发表的类型与所述第二转发表的类型相同,所述第六转发表在所述查询顺序中先于所述第二转发表,所述第六转发表和所述第二转发表位于不同的转发器中;所述控制器确定第四信息,所述第四信息表明所述第六转发表对所述第一业务的查表统计结果与所述第二转发表对所述第一业务的查表统计结果不同且不类似;所述控制器基于所述第一信息和第四信息确定所述第二转发表为发生故障的转发表。
在一种可能的设计中,所述第一转发表和所述第二转发表位于一个转发器中,或者位于不同的转发器中。
在一种可能的设计中所述控制器为软件定义网络(SDN)控制器,所述转发器为SDN转发器。
在一种可能的设计中,所述第一业务为纯粹(native)IPv4业务、基于通用路由封装GRE的IPv4(IPv4 over GRE)业务、纯粹(native)IPv6业务、基于IPv4的IPv6(IPv6 over IPv4)业务、基于分段路由的三层虚拟专用网(L3VPN over Segment Routing)业务、基于流量工程的二层虚拟专用网(L2VPN over TE)业务或基于虚拟可扩展局域网的以太网虚拟局域网(EVPN over VxLAN)业务。
在一种可能的设计中,所述第一业务为IPv4业务、IPv6业务、L3VPN业务、L2VPN业务或者EVPN业务。
本申请的另一方面提供了一种控制器,所述控制器包括:
接收单元,用于从一个或多个转发器接收多个转发表针对第一业务的多个查表统计结果,其中所述多个转发表针对所述第一业务存在查询顺序,所述多个转发表包括第一转发表和与第一转发表在所述查询顺序上相邻的第二转发表,在所述查询顺序中第一转发表先于第二转发表;
状态确定单元,用于基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定第一信息,所述第一信息表明(1)所述第一转发表针对所述第一业务的查表行为正常且(2)所述第二转发表针对所述第一业务的查表行为异常;
故障确定单元,用于所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表。
在一种可能的设计中,所述故障确定单元还用于:确定第二信息,所述第二信息表明所述第三转发表针对所述第一业务的查表统计结果与所述第二转发表针对所述第一业务的查表统计结果相同或类似;基于所述第二信息确定所述第三转发表针对所述第一业务是未发生故障的转发表。
在一种可能的设计中,所述多个转发表还包括第四转发表和第五转发表;所述接收单元具体用于:接收所述第四转发表针对所述第一业务的查询统计结果和所述第五转发表针对所述第一业务的查询统计结果,其中所述第四转发表在所述查询顺序中先于所述第一转发表,所述第五转发表在所述查询顺序中后于所述第二转发表,并且,所述控制器没有接收到所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果;确定第三信息,所述第三信息表明所述第四转发表针对所述第一业务的查表行为正常且所述第五转发表针对所述第一业务的查表行为异常;基于所述第三信息向所述第四转发表和所述第五转发表之间的转发表所在的转发器请求获得所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
在一种可能的设计中,所述故障确定单元具体用于:确定第六转发表对所述第一业务的查表统计结果,其中所述第六转发表的类型与所述第二转发表的类型相同,所述第六转发表在所述查询顺序中先于所述第二转发表,所述第六转发表和所述第二转发表位于不同的转发器中;确定第四信息,所述第四信息表明所述第六转发表对所述第一业务的查表统计结果与所述第二转发表对所述第一业务的查表统计结果不同且不类似;基于所述第一信息和第四信息确定所述第二转发表为发生故障的转发表。
在一种可能的设计中,所述第一转发表和所述第二转发表位于一个转发器中,或者位于不同的转发器中。
在一种可能的设计中,所述控制器为软件定义网络SDN控制器,所述转发器为SDN转发器。
在一种可能的设计中,所述第一业务为纯粹(native)IPv4业务、基于通用路由封装GRE的IPv4(IPv4 over GRE)业务、纯粹(native)IPv6业务、基于IPv4的IPv6(IPv6 over IPv4)业务、基于分段路由的三层虚拟专用网(L3VPN over Segment Routing)业务、基于流量工程的二层虚拟专用网(L2VPN over TE)业务或基于虚拟可扩展局域网的以太网虚拟局域网(EVPN over VxLAN)业务。
在一种可能的设计中,所述第一业务为IPv4业务、IPv6业务、L3VPN业务、L2VPN业务或者EVPN业务。
所述确定故障的方法和所述控制器可以高效地确定发生转发故障的转发表,而不仅仅是定位到发生转发故障的转发设备,因此可以更加精准地定位故障,有利于更快地恢复故障。
附图说明
图1所示为本发明实施例的网络100的结构示意图;
图2所示为本发明实施例中在同一转发器内查询不同转发表的方法200的流程图;
图3所示为本发明实施例中在同一转发器内查询不同转发表的方法300的流程图;
图4所示为本发明实施例中的确定转发异常的方法400的流程图;
图5所示为本发明实施例中的确定转发异常的方法500的流程图;
图6所示为本发明实施例中控制器接收多个查表统计结果的方法600的流程图;
图7所示为本发明实施例中确定发生故障的转发表的方法700的流程图。
图8所示为本发明实施例中的控制器800的结构示意图。
图9所示为本发明实施例中的控制器900的结构示意图。
具体实施方式
图1所示为本发明的一个实施例中的网络结构示意图。图1所示为网络100的网络结构示意图,网络100包括控制器110和转发器120(120a-120d),其中每个转发器均与控制器110相连,被控制器110所控制。在本实施例中,转发器的数量可以变化,网络100可以包括更多或者更少的转发器。在如图1所示的网络中,转发器120a和转发器120d可以是运营商边缘(Provider Edge,PE)设备,转发器120b和转发器120c可以是运营商(Provider,P)设备。转发器120b和转发器120c之间还可以存在一个或者多个P设备,这些P设备位于运营商网络的内部,而转发器120a和转发器120d位于运营商网络的边界,用于和用户网络或者另一个运营商网络相连。转发器120a-120d还可以是一个企业网中的转发器,或者是一个企业内部或者是一个运营商内部的一个数据中心中的转发器。转发器120a和120d并不必须是位于网络边缘的转发器,也可以是位于网络内部的。控制器110可以是软件定义网络(Software Defined Network,SDN)控制器,转发器120a-120d可以是SDN转发器;控制器110也可以是传统互联网协议(IP)网络中的网管设备,转发器120a-120d可以是传统IP网络中的交换机或路由器。
为了转发一个报文,同一个转发器可能需要针对该报文查询不同的转发表,不同的转发器也可能需要针对相同的报文分别查询不同的转发表,从而实现对该报文的转发。
图2所示为本发明一个实施例中在同一转发器内查询不同转发表的方法200的流程图。
202、转发器,例如转发器120a,根据接收到的报文中携带的虚拟路由器标识(Virtual Router Identification,VRID)和报文的目的IP地址查询转发信息库(Forwarding Information Base,FIB)6 260,从而得到一个路由表项(Route Entry,RE)的索引,其中报文的目的IP地址为IPv6地址。
204、根据该RE的索引,转发器确定路由表项RE6 262,并进一步使用报文的目的IP地址查询RE6 262,得到该报文的下一跳IP地址和一个隧道索引。
206、根据隧道索引查询隧道表项TNL 264,得到隧道类型和隧道的目的IP地址,其中隧道的目的IP地址是IPv4地址,其中TNL是隧道的英文的缩写。
208、根据隧道的目的IP地址查询IPv4转发信息库,命中表项FIB4 266,从而得到一个路由表项(RE)的索引。
209、根据在208得到的索引,转发器确定路由表项RE4 267,并进一步使用隧道的 目的IP地址查询路由表项RE4 267,得到报文的出端口信息和下一跳IP地址,所述出端口信息包括端口号或者TRUNK ID。
210、流量管理(Traffic Management,TM)模块将该报文从入接口板发送至出接口板。需要说明的是,在本实施例中,转发器有相互独立的入接口板和出接口板。一个报文通过入接口板进入转发器,然后再通过出接口板出转发器。202-206是由入接口板执行的,而212以及后续的操作是由出接口板执行的。但是,在有些转发器中,一块接口板既包括接收报文的能力,也包括发送报文的能力。在这种情况下,转发器中可能仅有一块接口板,但也可能有多块接口板。当这种具备收发能力的接口板接收到报文时,TM模块并不需要或者无法将该报文从该接口板发送至另一块接口板,而是由该接口板完成发送报文的操作。
212、根据隧道索引查询隧道表,命中隧道表项TNL 268,得到隧道封装方式、源IP地址、目的IP地址、出端口VLAN ID(OVID)和最大传输单元(Maximum Transmission Unit,MTU)等。在本实施例中,隧道封装方式可以为基于IPv4的IPv6(IPv6 over IPv4)。
214、根据在208得到的出端口信息和下一跳IP地址以及在212得到的OVID查询链路层封装表,命中链路层封装表项270,得到目的媒体访问控制(Media Access Control,MAC)地址,并基于目的MAC地址和其它相关信息进行相应的链路层封装,以便在隧道中传输IPv6 over IPv4格式的隧道报文。该隧道报文的外层目的IP地址是IPv4格式的地址,即IPv4地址。
图3所示为本发明一个实施例中在同一转发器内查询不同转发表项的方法300的流程图。
图3所示的内容是图2所示的内容的延续。依照查表顺序,执行图3所示方法的转发器是执行图2所示方法的转发器之后,或者可以说执行图3所示方法的转发器位于执行图2所示方法的转发器的下游。
302、转发器,例如转发器120b,从执行214的转发器,例如转发器120a,接收到IPv6 over IPv4格式的隧道报文,根据隧道报文中携带的VRID和IPv4格式的目的IP地址查询出接口板上的FIB,命中FIB表项FIB4 360,从而得到一个RE索引。
304、转发器根据所述RE索引确定与之对应的RE表项RE4 362,并使用该隧道报文的目的IP地址查询RE 632,得到该隧道报文的下一跳IP地址和出端口信息,其中出端口信息为出端口号或者TRUNK ID。
306、转发器根据该隧道报文的下一跳IP地址、所述出端口信息和OVID查询地址解析协议(Address Resolution Protocol,ARP)表,命中ARP表项ARP 364,从而得到一个目的MAC地址。这样,转发器可以使用该目的MAC继续转发该隧道报文。
如上所述,流程图200和流程图300描述了两个相邻的转发器是如何进行多次查表操作,并在查表成功后,实现对一个隧道报文的转发的。转发器可以对属于不同业务的报文做转发,隧道报文只是其中一种。在对属于任意一种业务的报文进行转发时,一个转发器可能需要进行多次查表操作。在每次查表操作都成功后,该报文会被根据所述多次查表操作中用于指明下一跳的查表结果向下一个转发器转发。被转发的报文可能会经过了一些处理,例如封装、解封装和安全性检测等。根据流程图200和流程图300可知, 为了能够转发该报文,每个转发器均需要按顺序查询其内部表项。因此,对于一个报文来说,一个转发器内部的各个表项可以被理解为是有查表顺序的。例如,对于本实施例中的IPv6 over IPv4隧道报文来说,可以认为转发表项260、262、264、266、268和270是按查表顺序排列的。FIB6 260是这些转发表项中的第一个表项,链路层封装表270是这些转发表项中的最后一个表项,并且每个表项沿着所述查表顺序都有它的邻居表项例如FIB260和RE6 262相邻,RE6 262和TNL264相邻也和FIB260相邻。
图4所示为本发明一个实施例中的确定转发异常的方法400的流程图。
402、转发器120接收第一业务的报文,查询转发器120内部的多个转发表,根据查询所述多个转发表的结果对所述多个报文进行处理,并将经过处理后的多个报文向网络中发送。以图2所示的场景为例,所述多个转发表可以是指转发表项FIB6 260所在的FIB表、转发表项RE6 262所在的RE6表、转发表项TNL 264所在的TNL表、转发表项FIB4 266所在的FIB 4表、转发表项TNL 268所在的TNL表和作为转发表项的链路层封装表项270所在的链路层封装表中的部分或者全部。所述“多个”是指两个或者两个以上。
404、转发器120针对第一业务的报文,分别确定所述多个转发表针对第一业务的多个查表统计结果。一个查表统计结果可以表示转发器460基于在一个时间段内收到的第一业务的报文查询一个转发表的成功次数、成功率、失败次数或者失败率。例如,转发器120a在5秒内接收到了100个第一业务的报文,并且转发器120a使用这100个报文对FIB 260所在的FIB进行了100次查询,其中有90次根据FIB 260得到了查找结果。在这种情况下,FIB这个转发表针对第一业务的查表统计结果可以是成功次数90、成功率90%、失败次数10或者失败率10%。
406、转发器120中的部分或全部向控制器110发送所述多个转发表的多个查表统计结果。以图2所示的场景为例,转发器可以将FIB6 260的查表统计结果、RE6 262的查表统计结果、TNL 264的查表统计结果、FIB4 266的查表统计结果、TNL 268的查表统计结果和链路层封装表项270的查表统计结果中的部分或者全部发送至控制器110。FIB6 260的查表统计结果是FIB6 260所在的转发表的查表统计结果的简要表达,这种简要表达的方式也适用于其它查表统计结果。
408、控制器110根据接收到的所述多个查表统计结果确定发生故障的转发表。
控制器110接收到的所述多个查表统计结果的内容可能有多种情况。
情况1
所述多个查表统计结果包括针对该业务的两个相邻的表,例如FIB6 260所在的FIB表和RE6 262所在的RE表,的查表统计结果。并且,控制器110基于这两个查表统计结果可以确定这两个相邻的表中先被查的表,例如FIB6 260所在的FIB表,的查表统计结果正常,后被查的表,例如RE6 262所在的RE表,的查表统计结果异常。这两个相邻的表可以是在同一个转发器内部的,也可以分别属于两个相邻的转发器。例如当先被查的表是指图3所示的FIB360,那么后被查的表就和FIB360不在同一台转发器中,而是在两台相邻的转发器中。
一个转发表的查表统计结果的正常或者异常可以根据查询该转发表的成功次数、成 功率、失败次数或者失败率和一个阈值做比较得出。这个比较可以由控制器做,也可以由转发器做。如果由转发器做,那么控制器可以直接从转发器得知一个转发表的查表统计结果是否正常。此外,一个转发表的查表统计结果的正常或者异常也可以是一个相对概念。控制器可以确定在后一个转发表的查表统计结果是否比在先相邻的转发表的查表统计结果变差的程度是否达到了一个阈值。如果达到了阈值,则可以认为后一个转发表针对报文的查表行为异常,前一个转发表针对同一业务的报文的查表行为正常。例如当前面的FIB6 260的查表成功率是90%,而后面相邻的RE6 262的查表成功率是60%,那么成功率的差值是30%。如果阈值是20%,那么就可以认为:与RE6 262相关的查表行为异常,即相对于FIB6 260相关的查表行为异常;与FIB6 260相关的查表行为正常,即相对于RE6 262相关的查表行为正常。
对于情况1,由于先被查的转发表的查表统计结果正常,后被查的转发表的查表结果异常,控制器110确定后被查的转发表为发生故障的转发表。
情况2
控制器110接收到的所述多个查表统计结果中包括针对该业务的两个不相邻的表的查表统计结果,其中,所述多个查表统计结果中没有任何处于这两个不相邻的表之间的表的查表统计结果,这两个不相邻的表中先被查的表的查表统计结果正常,后被查的表的查表统计结果异常。所述两个不相邻的表可以位于同一个接口板上,也可以位于不同的接口板上,还可以是位于不同的转发器上。例如,控制器110接收到了RE6 262所在的RE表的查表统计结果和RE4 267所在的RE表的查表统计结果,其中RE6 262所在的RE表和RE4 267所在的RE表在同一块接口板上。但是控制器110没有接收到TNL264所在的TNL表的查表结果,也没有接收到FIB4 266所在的FIB表的查表统计结果。在该例子中,控制器110可能还收到了FIB6 260所在的FIB表统计结果,以及查表顺序在RE4 267之后的表的查表统计结果,例如TNL268所在的TNL表的查表统计结果和LE270所在的LE表的查表统计结果。
对于情况2,控制器110通知包含所述两个表的一个或者两个转发器向控制器110发送位于这两个表之间的、用于对所述业务进行转发的一个或多个中间表的查表统计结果。当所述两个表属于同一个转发器时,该转发器可以根据控制器110的通知进行上传;当所述两个表属于两个转发器时,这两个相邻的转发器以及位于这两个转发器中间的一个或多个转发器中存储有所述一个或多个中间表的一个或多个转发器可以根据控制器110的通知下降控制器110发送所述一个或多个中间表针对所述业务的查表统计结果;
控制器110基于接收到的所述一个或多个中间表的查表统计结果确定发生异常的查表动作。其具体方法可以包括:从所述两个不相邻表和所述一个或多个中间表中确定两个相邻的表,其中,这两个相邻的表中先被查的转发表的查表统计结果正常,后被查的转发表的查表结果异常;确定后被查的转发表为发生故障的转发表。在一个例子中,两个不相邻的表分别为RE6 260所在的RE表和RE4 267所在的RE表,并且一个或多个中间表是TNL264所在的TNL表和FIB4 266所在的FIB表。当控制器确定FIB4 266所在的FIB表的查表统计结果正常并且RE4 207所在的RE表的查表统计结果异常时,确定RE4 207所在的RE表为发生故障的转发表。
当控制器110按照上述方法确定一个转发表为针对一个业务发生故障的转发表后,控制器110可以认为这是一个确定的结果,也可以认为这是一个可能的结果,并且需要对这个可能的结果做进一步验证。当控制器110认为这是一个可能的结果时,控制器110可以对这个可能的结果做验证。控制器110在确定一个转发器上的一个转发表可能针对一个业务发生故障时,可以将该转发表的查表统计结果与另一个转发器上同一类型的转发表针对同一业务的查表统计结果进行比较,其中,针对所述业务,所述另一转发器上同一类型的转发表在查表顺序上先于所述可能发送故障的转发表。在一个例子中,控制器110确定RE4 362所在的RE表针对一个业务可能发生了故障。这时,控制器110可以将RE4 362所在的RE表针对该业务的查表统计结果和在一个在查表顺序上处于在先位置的同类型的转发表,即RE4 267所在的转发表,的查表统计结果进行比较。当该同类型的转发的查表统计结果正常,则控制器110可以更准确地确定RE4 362所在的RE表针对该业务发生了故障。
在本发明的实施例中,控制器110可以根据转发器上报的多个转发表针对同一业务的查表统计结果确定发生故障的转发表,因此可以精确地对故障进行定位,以便后续针对故障进行处理,例如排除故障等。
图5所示为本发明一个实施例中的确定转发异常的方法500的流程图。
505、控制器接收多个转发表针对第一业务的多个查表统计结果,其中所述多个转发表针对所述第一业务存在查询顺序,所述多个转发表包括第一转发表和与第一转发表在所述查询顺序上相邻的第二转发表,在所述查询顺序中第一转发表先于第二转发表。所述控制器可以图1或图4所示的控制器110。
前述实施例已经结合图2和图3介绍了一个或多个转发器内的转发表针对一个业务具有查询顺序,对本实施例中查询顺序以及相关其它概念的理解可参照前述实施例中的介绍。所述一个或多个转发器可以指图1或图4所示的转发器120中的一个或多个。在一个例子中,本实施例中第一转发表可以是FIB6 260所在的FIB表,第二转发表可以是RE6 262所在的RE表,或者,第一转发表可以是RE4 267所在的RE表,第二转发表可以是TNL 268所在的TNL表,或者,第一转发表可以是LE 270所在的LE表,第二转发表可以是FIB4 360所在的FIB表。所述多个转发表可以是指图2所示的表项260-270对应的全部或者部分转发表,可以是图3所示的表项360-364对应的全部或者部分转发表,也可以是指图2和图3中的表项260-268和360-364对应的全部或部分转发表。
控制器可以通过多种方式接收所述多个转发表针对第一业务的多个查表统计结果。
在一个例子中,控制器可以基于一次请求或者在无需发送请求的情况下获得所述多个转发表针对第一业务的多个查表统计结果。
在另一个例子中,控制器可以先获得所述多个转发表中一部分转发表针对第一业务的一个或多个查表统计结果,然后在需要时,基于发送给一个或多个转发器的请求来获得所述多个转发表中剩余的转发表中的部分或全部表针对针对第一业务的一个或多个查表统计结果。图6所示的方法是基于上述思路接收所述多个转发表针对第一业务的多个查表统计结果的一个具体的示例,图6中的详细内容将会在本申请的后续内容中介绍。
在控制器接收到所述多个查表统计结果后,方法500执行510。
510、所述控制器基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定第一信息,所述第一信息表明所述第一转发表针对所述第一业务的查表行为正常且所述第二转发表针对所述第一业务的查表行为异常。
本发明的前述实施例已经对一个转发表针对一个业务的查表行为正常或异常做了介绍,对本实施例中查表行为正常或异常以及相关概念的理解可以参照前述实施例中的介绍。一个转发表针对一个业务的查表统计结果的正常或者异常可以根据针对该业务查询该转发表的成功次数、成功率、失败次数或者失败率和一个阈值做比较得出,也可以通过其它类似的操作实现。这个比较可以由控制器做,也可以由转发器做。如果由转发器做,那么控制器可以直接从转发器得知一个转发表的查表统计结果是否正常。此外,一个转发表的查表统计结果的正常或者异常也可以是一个相对概念。控制器可以确定在后一个转发表的查表统计结果是否比在先相邻的转发表的查表统计结果变差的程度是否达到了一个阈值。如果达到了阈值,则可以认为后一个转发表针对报文的查表行为异常,前一个转发表针对同一业务的报文的查表行为正常。例如当前面的FIB6 260的查表成功率是90%,而后面相邻的RE6 262的查表成功率是60%,那么成功率的差值是30%。如果阈值是20%,那么就可以认为:与RE6 262相关的查表行为异常,即相对于FIB6 260相关的查表行为异常;与FIB6 260相关的查表行为正常,即相对于RE6 262相关的查表行为正常。
本实施例中的第一业务可以按照不同的粒度来划分。当所述第一业务是以较细的粒度来划分时,所述第一业务可以是纯粹IPv4(native IPv4)业务、基于通用路由封装GRE的IPv4(IPv4 over GRE)业务、纯粹IPv6(native IPv6)业务、基于IPv4的IPv6(IPv6 over IPv4)业务、基于分段路由的三层虚拟专用网(L3VPN over Segment Routing)业务、基于流量工程的二层虚拟专用网(L2VPN over TE)业务、基于虚拟可扩展局域网的以太网虚拟局域网(EVPN over VxLAN)业务或其它类似的业务。当所述第一业务是以较粗的粒度来划分时,所述第一业务可以是IPv4业务、IPv6业务、L3VPN业务、L2VPN业务或者EVPN业务。
515、所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表。
在第一信息中,位置在先的所述第一转发表针对所述第一业务的查表行为正常且位置在后的邻居转发表,即所述第二转发表,针对所述第一业务的查表行为异常。由于所述第一转发表针对所述第一业务的查表行为是正常的,所以第一转发表针对所述第一业务并未发生故障,第二转发表为针对所述第一业务发生故障的转发表。此外,在一些例子中,所述控制器还可以对所述第二转发表是否为发生故障的转发表做进一步的验证后,并且在通过验证证实了所述第二转发表确实为发生故障的转发表后,在最终确定所述第二转发表为发生故障的转发表。这个验证过程也可以被视为是所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表中的一部分操作。这部分关于验证的细节由与图7对应的实施例来介绍。
图5所示的实施例中还可以进一步包括可选操作520和525。
520、所述控制器确定第二信息,所述第二信息表明所述多个转发表中的第三转发表针对所述第一业务的查表统计结果与所述第二转发表针对所述第一业务的查表统计结果相同或类似,并且所述第三转发表在所述查询顺序中后于所述第二转发表。
举例来说,如果所述第一转发表为图2所示的FIB6 260所在的FIB表,所述第二转发表为RE6 262所示的RE表,则所述第三转发表为TNL264所在的TNL表。当第二转发表被确定为针对所述第一业务发生故障的转发表,且第三转发表表针对所述第一业务的查表统计结果与所述第二转发表针对所述第一业务的查表统计结果相同或类似,则说明第三转发表针对所述第一业务出现的查表结果异常是由第二转发表的故障引起的,第三转发表针对所述第一业务出现的查表行为本身是正常的。这里的类似是指两个比较对象之间的区别小于一个设定的标准,该标准可以由网络管理者根据实际经验进行确定。
525、所述控制器基于所述第二信息确定,所述第三转发表针对所述第一业务是未发生故障的转发表。
图6所示为本发明实施例中控制器接收多个查表统计结果的方法600的流程图。该方法是实现505的一个具体例子。图6所示的方法可以是基于505-515的,也可以是基于505-525的。
605、所述控制器接收所述多个转发表中的第四转发表针对所述第一业务的查询统计结果和所述多个转发表中的第五转发表针对所述第一业务的查询统计结果,其中所述第四转发表在所述查询顺序中先于所述第一转发表,所述第五转发表在所述查询顺序中后于所述第二转发表,并且,所述控制器没有接收到所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
在一个例子,所述第一转发表和第二转发表分别是TNL264和FIB4 266所在的两个转发表。在这种情况下,所述第四转发表可以是RE6 262所在的转发表或者FIB6 260所在的转发表,所述第五转发表可以是转发表项267-270中的任意一个所在的转发表,或者是转发表项267-364中的任意一个所在的转发表。
610、所述控制器确定第三信息,所述第三信息表明所述第四转发表针对所述第一业务的查表行为正常且所述第五转发表针对所述第一业务的查表行为异常。
615、所述控制器基于所述第三信息向所述第四转发表和所述第五转发表之间的转发表所在的转发器请求获得所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
当所述第四转发表和所述第五转发表之间的转发表位于同一个转发器中,控制器可以向该转发器请求所述第四转发表和所述第五转发表之间的转发表。当所述第四转发表和所述第五转发表之间的转发表位于不同的转发器中,则控制器需要向所述不同的转发器请求所述第四转发表和所述第五转发表之间的转发表。
图7所示为本发明实施例中确定发生故障的转发表的方700法的流程图。该方法为515的一种具体实现方式。
705、所述控制器确定第六转发表对所述第一业务的查表统计结果,其中所述第六转发表的类型与所述第二转发表的类型相同,所述第六转发表在所述查询顺序中先于所述第二转发表,所述第六转发表和所述第二转发表位于不同的转发器中。
在一个例子中,所述第二转发表是图3所示的在一个转发器中的RE4 362所在的RE转发表,那么所述第六转发表可以是位于所述RE4转发表的之前的另一个转发器中的RE4的转发表,例如图2所示的RE4 267所在的RE4转发表。相同类型的两个转发表可以是指有相同的结构或者相同的字段对应关系的转发表。所述字段对应关系是指两个字段之间的对应关系。例如当一个字段是IP地址,另一个字段是MAC地址,则基于这两个字段的字段对应关系就是IP地址与MAC地址相对应。
710、所述控制器确定第四信息,所述第四信息表明所述第六转发表对所述第一业务的查表统计结果与所述第二转发表对所述第一业务的查表统计结果不同且不类似。不类似是指两个查表统计结果的不同超出了一个设定的阈值。
在一个例子中,第六转发表对所述第一业务的查表统计结果为查表成功率为90%,第二转发表对所属第一业务的查表统计结果为查表成功率为70%。很明显,这两个查表统计结果不同。假设在本例子中,在比一个查表统计结果多10%或者少10%都被视为与这个查表统计结果类似。那么对于所述第六转发表来说,如果目标转发表的查表成功率落在了80%到100%之间,那么目标转发表的查表统计结果与所述第六转发表的查表统计结果类似,但是由于在本例子中,第二转发表的查表统计结果为查表成功率70%,不在80%到100%之间,因此认为这两个转发表对第一业务的查表统计结果不类似。
715、所述控制器基于所述第一信息和第四信息确定所述第二转发表为发生故障的转发表。
经过上述验证,可以更加准确地确定所述第二转发表为发生故障的转发表。
图8所示为本发明实施例中的一个控制器800的结构示意图。
控制器800可以是上述实施例中的控制器110,能够基于其内部的多个单元执行控制器110在上述实施例中所执行的全部操作。控制器800可以基于其内部的多个单元执行图1-7所示的方法中由控制器完成的全部操作。
如图8所示,控制器800包括接收单元805、状态确定单元810和故障确定单元815。所述接收单元805、状态确定单元810和故障确定单元815可以是三个独立的硬件单元,可以是三个软件单元。
所述接收单元805用于从一个或多个转发器接收多个转发表针对第一业务的多个查表统计结果,其中所述多个转发表针对所述第一业务存在查询顺序,所述多个转发表包括第一转发表和与第一转发表相邻的第二转发表,在所述查询顺序中第一转发表先于第二转发表。
所述状态确定单元810用于基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定第一信息,所述第一信息表明所述第一转发表针对所述第一业务的查表行为正常且所述第二转发表针对所述第一业务的查表行为异常。
所述故障确定单元815用于所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表。
所述故障确定单元815还可以用于:确定第二信息,所述第二信息表明所述第三转 发表针对所述第一业务的查表统计结果与所述第二转发表针对所述第一业务的查表统计结果相同或类似,并且所述第三转发表在所述查询顺序中后于所述第二转发表;基于所述第二信息确定所述第三转发表针对所述第一业务是未发生故障的转发表。
所述接收单元805可以具体用于:接收第四转发表针对所述第一业务的查询统计结果和第五转发表针对所述第一业务的查询统计结果,其中所述第四转发表在所述查询顺序中先于所述第一转发表,所述第五转发表在所述查询顺序中后于所述第二转发表,并且,所述控制器没有接收到所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果;确定第三信息,所述第三信息表明所述第四转发表针对所述第一业务的查表行为正常且所述第五转发表针对所述第一业务的查表行为异常;基于所述第三信息向一个或多个转发器请求获得所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
所述故障确定单元815可以具体用于:确定第六转发表对所述第一业务的查表统计结果,其中所述第六转发表的类型与所述第二转发表的类型相同,所述第六转发表在所述查询顺序中先于所述第二转发表,所述第六转发表和所述第二转发表位于不同的转发器中;确定第四信息,所述第四信息表明所述第六转发表对所述第一业务的查表统计结果与所述第二转发表对所述第一业务的查表统计结果不同且不类似;基于所述第一信息和第四信息确定所述第二转发表为发生故障的转发表。
所述第一转发表和所述第二转发表位于一个转发器中,或者位于不同的转发器中。所述控制器为软件定义网络SDN控制器,所述转发器为SDN转发器。所述第一业务为纯粹(native)IPv4业务、基于通用路由封装(GRE)的IPv4(IPv4 over GRE)业务、纯粹(native)IPv6业务、基于IPv4的IPv6(IPv6 over IPv4)业务、基于分段路由的三层虚拟专用网(L3VPN over Segment Routing)业务、基于流量工程的二层虚拟专用网(L2VPN over TE)业务或基于虚拟可扩展局域网的以太网虚拟局域网(EVPN over VxLAN)业务。此外,所述第一业务也可以为IPv4业务、IPv6业务、L3VPN业务、L2VPN业务或者EVPN业务。
图9所示为本发明实施例中的控制器900的结构示意图。如图9所示,控制器900包括处理器910,与处理器910通信的存储器920以及收发器930。
处理器910可以包括一个或多个中央处理器(central processing unit,CPU)、一个或多个网络处理器(network processor,NP)、一个或多个专用集成电路(application-specific integrated circuit,ASIC),一个或多个可编程逻辑器件(programmable logic device,PLD)或上述各种类型具备处理能力的器件中的部分和全部器件的组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
存储器920可以是指一个存储器,也可以包括多个存储器。存储器920可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态 硬盘(solid-state drive,SSD);存储器还可以包括上述种类的存储器的组合。存储器920中存储有计算机可读指令,所述计算机可读指令包括控制器900所运行的操作系统922以及用于实现控制器900多个软件单元,例如接收单元924、状态确定单元926和故障确定单元928。
收发器930可以是指两个接口板,也可以是指同一个接口板上不同的端口。
当处理器910基于操作系统922分别执行接收单元924、状态确定单元926和故障确定单元928中的计算机可读指令时,处理器910可以执行,或者使控制器900执行,处理器800中的接收单元805、状态确定单元810和故障确定单元815的功能和所执行的操作。
处理器910执行各个软件单元后可以按照各个软件单元的指示进行相应的操作。处理器910执行存储器920中的计算机可读指令后,可以按照所述计算机可读指令的指示,执行,或者使控制器900执行,本申请中上述内容中的控制器,例如控制器110,所执行的全部操作。
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉该技术的人在本发明所揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。

Claims (16)

  1. 一种确定转发故障位置的方法,其特征在于,包括:
    控制器从一个或多个转发器接收多个转发表针对第一业务的多个查表统计结果,其中所述多个转发表针对所述第一业务存在查询顺序,所述多个转发表包括第一转发表和与第一转发表在所述查询顺序上相邻的第二转发表,在所述查询顺序中第一转发表先于第二转发表;
    所述控制器基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定第一信息,所述第一信息表明(1)所述第一转发表针对所述第一业务的查表行为正常且(2)所述第二转发表针对所述第一业务的查表行为异常;
    所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表。
  2. 根据权利要求1所述的方法,其特征在于,所述多个转发表还包括第三转发表,所述第三转发表在所述查询顺序中后于所述第二转发表,所述方法包括:
    所述控制器确定第二信息,所述第二信息表明所述第三转发表针对所述第一业务的查表统计结果与所述第二转发表针对所述第一业务的查表统计结果相同或类似;
    所述控制器基于所述第二信息确定所述第三转发表针对所述第一业务是未发生故障的转发表。
  3. 根据权利要求1所述的方法,其特征在于,所述多个转发表还包括第四转发表和第五转发表,所述控制器接收多个转发表针对第一业务的多个查表统计结果包括:
    所述控制器接收所述第四转发表针对所述第一业务的查询统计结果和所述第五转发表针对所述第一业务的查询统计结果,其中所述第四转发表在所述查询顺序中先于所述第一转发表,所述第五转发表在所述查询顺序中后于所述第二转发表,并且,所述控制器没有接收到所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果;
    所述控制器确定第三信息,所述第三信息表明所述第四转发表针对所述第一业务的查表行为正常且所述第五转发表针对所述第一业务的查表行为异常;
    所述控制器基于所述第三信息向所述第四转发表和所述第五转发表之间的转发表所在的转发器请求获得所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述控制器基于所述第一信息确定所述第二转发表为发生故障的转发表包括:
    确定第六转发表对所述第一业务的查表统计结果,其中所述第六转发表的类型与所述第二转发表的类型相同,所述第六转发表在所述查询顺序中先于所述第二转发表,所 述第六转发表和所述第二转发表位于不同的转发器中;
    所述控制器确定第四信息,所述第四信息表明所述第六转发表对所述第一业务的查表统计结果与所述第二转发表对所述第一业务的查表统计结果不同且不类似;
    所述控制器基于所述第一信息和第四信息确定所述第二转发表为发生故障的转发表。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一转发表和所述第二转发表位于一个转发器中,或者位于不同的转发器中。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述控制器为软件定义网络SDN控制器,所述转发器为SDN转发器。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述第一业务为纯粹native IPv4业务、基于通用路由封装GRE的IPv4 IPv4 over GRE业务、纯粹native IPv6业务、基于IPv4的IPv6 IPv6 over IPv4业务、基于分段路由的三层虚拟专用网L3VPN over Segment Routing业务、基于流量工程的二层虚拟专用网L2VPN over TE业务或基于虚拟可扩展局域网的以太网虚拟局域网EVPN over VxLAN业务。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述第一业务为IPv4业务、IPv6业务、L3VPN业务、L2VPN业务或者EVPN业务。
  9. 一种控制器,其特征在于,包括:
    接收单元,用于从一个或多个转发器接收多个转发表针对第一业务的多个查表统计结果,其中所述多个转发表针对所述第一业务存在查询顺序,所述多个转发表包括第一转发表和与第一转发表在所述查询顺序上相邻的第二转发表,在所述查询顺序中第一转发表先于第二转发表;
    状态确定单元,用于基于所述第一转发表针对所述第一业务的查表统计结果和所述第二转发表针对所述第一业务的查表统计结果确定第一信息,所述第一信息表明(1)所述第一转发表针对所述第一业务的查表行为正常且(2)所述第二转发表针对所述第一业务的查表行为异常;
    故障确定单元,用于所述控制器基于所述第一信息确定所述第二转发表为针对所述第一业务发生故障的转发表。
  10. 根据权利要求9所述的控制器,其特征在于,
    所述多个转发表还包括第三转发表,所述第三转发表在所述查询顺序中后于所述第二转发表;
    所述故障确定单元还用于:
    确定第二信息,所述第二信息表明所述第三转发表针对所述第一业务的查表统计结 果与所述第二转发表针对所述第一业务的查表统计结果相同或类似;
    基于所述第二信息确定所述第三转发表针对所述第一业务是未发生故障的转发表。
  11. 根据权利要求9或10所述的控制器,其特征在于,
    所述多个转发表还包括第四转发表和第五转发表;
    所述接收单元具体用于:
    接收所述第四转发表针对所述第一业务的查询统计结果和所述第五转发表针对所述第一业务的查询统计结果,其中所述第四转发表在所述查询顺序中先于所述第一转发表,所述第五转发表在所述查询顺序中后于所述第二转发表,并且,所述控制器没有接收到所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果;
    确定第三信息,所述第三信息表明所述第四转发表针对所述第一业务的查表行为正常且所述第五转发表针对所述第一业务的查表行为异常;
    基于所述第三信息向所述第四转发表和所述第五转发表之间的转发表所在的转发器请求获得所述第四转发表和所述第五转发表之间的转发表针对所述第一业务的查询统计结果。
  12. 根据权利要求9-11中任一项所述的控制器,其特征在于,所述故障确定单元具体用于:
    确定第六转发表对所述第一业务的查表统计结果,其中所述第六转发表的类型与所述第二转发表的类型相同,所述第六转发表在所述查询顺序中先于所述第二转发表,所述第六转发表和所述第二转发表位于不同的转发器中;
    确定第四信息,所述第四信息表明所述第六转发表对所述第一业务的查表统计结果与所述第二转发表对所述第一业务的查表统计结果不同且不类似;
    基于所述第一信息和第四信息确定所述第二转发表为发生故障的转发表。
  13. 根据权利要求9至12中任一项所述的控制器,其特征在于,所述第一转发表和所述第二转发表位于一个转发器中,或者位于不同的转发器中。
  14. 根据权利要求9至13中任一项所述的控制器,其特征在于,所述控制器为软件定义网络SDN控制器,所述转发器为SDN转发器。
  15. 根据权利要求9至14中任一项所述的控制器,其特征在于,所述第一业务为纯粹native IPv4业务、基于通用路由封装GRE的IPv4 IPv4 over GRE业务、纯粹native IPv6业务、基于IPv4的IPv6 IPv6 over IPv4业务、基于分段路由的三层虚拟专用网L3VPN over Segment Routing业务、基于流量工程的二层虚拟专用网L2VPN over TE业务或基于虚拟可扩展局域网的以太网虚拟局域网EVPN over VxLAN业务。
  16. 根据权利要求9至15中任一项所述的控制器,其特征在于,所述第一业务为IPv4业务、IPv6业务、L3VPN业务、L2VPN业务或者EVPN业务。
PCT/CN2019/128517 2018-12-29 2019-12-26 一种确定转发故障位置的方法和设备 WO2020135547A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19903433.1A EP3886364A4 (en) 2018-12-29 2019-12-26 METHOD AND DEVICE FOR DETERMINING THE LOCATION OF A ROUTING FAILURE
US17/361,733 US11902087B2 (en) 2018-12-29 2021-06-29 Forwarding fault location determining method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811634089.1 2018-12-29
CN201811634089.1A CN111385120B (zh) 2018-12-29 2018-12-29 一种确定转发故障位置的方法和设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/361,733 Continuation US11902087B2 (en) 2018-12-29 2021-06-29 Forwarding fault location determining method and device

Publications (1)

Publication Number Publication Date
WO2020135547A1 true WO2020135547A1 (zh) 2020-07-02

Family

ID=71129093

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/128517 WO2020135547A1 (zh) 2018-12-29 2019-12-26 一种确定转发故障位置的方法和设备

Country Status (4)

Country Link
US (1) US11902087B2 (zh)
EP (1) EP3886364A4 (zh)
CN (1) CN111385120B (zh)
WO (1) WO2020135547A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150131666A1 (en) * 2013-11-08 2015-05-14 Electronics And Telecommunications Research Institute Apparatus and method for transmitting packet
CN106330506A (zh) * 2015-06-29 2017-01-11 华为技术有限公司 一种业务故障定位方法及装置
CN107005481A (zh) * 2014-06-30 2017-08-01 瑞典爱立信有限公司 对于双向转发检测返回路径的控制
CN107800630A (zh) * 2016-09-02 2018-03-13 南京中兴软件有限责任公司 报文处理方法及装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4341413B2 (ja) * 2003-07-11 2009-10-07 株式会社日立製作所 統計収集装置を備えたパケット転送装置および統計収集方法
GB0428541D0 (en) * 2004-12-31 2005-02-09 British Telecomm Out-of-band switch control
EP2834941B1 (en) * 2012-04-05 2019-02-27 Schneider Electric Industries SAS Diagnosing and reporting a network break
CN102868553B (zh) * 2012-08-28 2016-03-30 华为技术有限公司 故障定位方法及相关设备
CN103475507B (zh) * 2013-08-28 2016-10-05 华为技术有限公司 转发表项故障检测方法和装置
CN105827419B (zh) * 2015-01-05 2020-03-10 华为技术有限公司 一种转发设备故障处理的方法、设备和控制器
CN106375105B (zh) * 2015-07-24 2019-10-25 华为技术有限公司 一种确定路径故障的方法、控制器、交换机和系统
CN107171883B (zh) * 2016-03-08 2020-04-28 华为技术有限公司 检测转发表的方法、装置和设备
CN105827524A (zh) * 2016-03-18 2016-08-03 联想(北京)有限公司 一种信息处理方法及电子设备
US10355983B2 (en) * 2016-05-09 2019-07-16 Cisco Technology, Inc. Traceroute to return aggregated statistics in service chains
US10129127B2 (en) * 2017-02-08 2018-11-13 Nanning Fugui Precision Industrial Co., Ltd. Software defined network controller, service function chaining system and trace tracking method
EP3677000B1 (en) * 2017-08-30 2022-11-16 Telefonaktiebolaget LM Ericsson (PUBL) Method and system for tracing packets in software defined networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150131666A1 (en) * 2013-11-08 2015-05-14 Electronics And Telecommunications Research Institute Apparatus and method for transmitting packet
CN107005481A (zh) * 2014-06-30 2017-08-01 瑞典爱立信有限公司 对于双向转发检测返回路径的控制
CN106330506A (zh) * 2015-06-29 2017-01-11 华为技术有限公司 一种业务故障定位方法及装置
CN107800630A (zh) * 2016-09-02 2018-03-13 南京中兴软件有限责任公司 报文处理方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3886364A4

Also Published As

Publication number Publication date
US20210328859A1 (en) 2021-10-21
CN111385120B (zh) 2021-10-26
US11902087B2 (en) 2024-02-13
CN111385120A (zh) 2020-07-07
EP3886364A1 (en) 2021-09-29
EP3886364A4 (en) 2022-01-26

Similar Documents

Publication Publication Date Title
US11539619B1 (en) Local-bias forwarding of L2 multicast, unknown unicast, and broadcast traffic for an ethernet VPN
US9860150B2 (en) Fast convergence of EVPN networks for multi homing topologies
US9807016B1 (en) Reducing service disruption using multiple virtual IP addresses for a service load balancer
CN112688888B (zh) Evpn vxlan上改进的端口镜像
CN111064596B (zh) 对于用于多宿主节点故障的bum流量的节点保护
EP3188422A1 (en) Traffic black holing avoidance and fast convergence for active-active pbb-evpn redundancy
US10097633B2 (en) Automated mirroring and remote switch port analyzer (RSPAN)/encapsulated remote switch port analyzer (ERSPAN) functions using fabric attach (FA) signaling
US20210099400A1 (en) Synchronizing multicast router capability towards ethernet virtual private network (evpn) multi-homed protocol independent multicast (pim) device
CN112688873B (zh) 在evpn中部署安全邻居发现
EP3200398B1 (en) Automated mirroring and remote switch port analyzer (rspan)/encapsulated remote switch port analyzer (erspan) functions using fabric attach (fa) signaling
WO2021082803A1 (zh) 路由信息传输方法及装置、数据中心互联网络
US11522792B2 (en) Method for discovering forwarding path and related device thereof
WO2019223534A1 (zh) 一种转发表项的监测方法及装置
US20110242988A1 (en) System and method for providing pseudowire group labels in a network environment
WO2023103461A1 (zh) 基于clos架构的报文跨板组播复制转发方法和系统
CN113438174A (zh) 一种报文转发方法及装置
US20210203695A1 (en) Anti-spoofing attack check method, device, and system
US11606390B1 (en) Rerouting network traffic based on detecting offline connection
US20180062966A1 (en) Selective transmission of bidirectional forwarding detection (bfd) messages for verifying multicast connectivity
WO2020135547A1 (zh) 一种确定转发故障位置的方法和设备
CN114726784B (zh) 用于报告标签交换路径中的不可用性的方法和系统
WO2022078338A1 (zh) 路径确定方法及装置、计算机存储介质
US9025606B2 (en) Method and network node for use in link level communication in a data communications network
EP3151486A1 (en) Fast convergence of evpn networks for multi homing topologies
WO2015188706A1 (zh) 数据帧的处理方法、装置与系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903433

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019903433

Country of ref document: EP

Effective date: 20210625