WO2021155637A1 - 信息处理方法、设备、系统及存储介质 - Google Patents

信息处理方法、设备、系统及存储介质 Download PDF

Info

Publication number
WO2021155637A1
WO2021155637A1 PCT/CN2020/083981 CN2020083981W WO2021155637A1 WO 2021155637 A1 WO2021155637 A1 WO 2021155637A1 CN 2020083981 W CN2020083981 W CN 2020083981W WO 2021155637 A1 WO2021155637 A1 WO 2021155637A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
message
information
data
packet loss
Prior art date
Application number
PCT/CN2020/083981
Other languages
English (en)
French (fr)
Inventor
孙晨
刘洪强
周禹
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021155637A1 publication Critical patent/WO2021155637A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control

Definitions

  • This application relates to the field of Internet technology, and in particular to an information processing method, device, system, and storage medium.
  • the network scale of the data center is getting larger and larger.
  • a single cluster may contain thousands of switches, tens of thousands of servers, and hundreds of thousands of photoelectric lines.
  • network applications will often encounter various performance problems, such as connection interruptions, decreased bandwidth, increased latency, and so on. These performance problems can cause serious service quality degradation and cause losses to network operators.
  • the common application performance exception handling method in the prior art can be referred to as "online repair and offline diagnosis”.
  • the network administrator locates the failed device or link.
  • the data center network has better redundancy, network administrators can safely isolate faulty equipment or links without affecting the normal operation of network applications.
  • the network administrator can diagnose the cause of the fault offline without affecting the normal operation of the network application.
  • network administrators usually locate faulty devices or links by combining coarse-grained information collected from multiple sources, and guessing whether there is a problem with the network based on experience, and if so, where the problem might be . This kind of guessing may be wrong, and its verification also consumes a lot of time and slows down the positioning progress. As a result, the positioning time of the faulty device or link often reaches minutes or even hours.
  • Various aspects of this application provide an information processing method, equipment, system, and storage medium to solve problems such as poor positioning accuracy and slow speed of network problems.
  • the embodiment of the present application provides a network switching device, including: a programmable data plane; the programmable data plane is programmed to select an event message in which a set event occurs from a data stream passing through the network switching device ; Provide event information to the data processing terminal based on the event message.
  • the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event.
  • the embodiment of the present application also provides an information processing method, which is suitable for network switching equipment.
  • the network switching equipment has a programmable data plane.
  • the method is implemented by the programmed data plane.
  • the method includes: In the flow, select the event message where the set event occurs; provide event information to the data processing terminal based on the event message, and the event information is used to describe the related information of the set event, which can be used to locate network problems related to the set event.
  • the embodiment of the present application also provides an information processing method, which is suitable for the data processing end, and the method includes: receiving event information sent by a network switching device, and the event information is used to describe information related to a set event occurring in a data stream passing through the network switching device ; Save event information, and provide query operations to network administrators for network administrators to locate network problems related to the set event.
  • the embodiment of the present application also provides an information processing method, which is suitable for the data processing end.
  • the method includes: receiving an event message sent by a network switching device and its corresponding event metadata.
  • the event message is a data stream passing through the network switching device.
  • the event information is extracted from the event message and the corresponding event metadata, and the event information is used to describe the related information about the occurrence of the set event; the event information is saved, and the query operation is provided to the network administrator, For network administrators to locate network problems related to setting events.
  • the embodiment of the present application also provides a data processing device, including: a memory, a processor, and a communication component; the memory is used to store a computer program; the processor is coupled with the memory and is used to execute the computer program for: through the communication component Receive event information sent by network switching equipment. Event information is used to describe information related to a set event that occurs in the data stream passing through the network switching equipment; save event information and provide query operations to network administrators for network administrators to locate and set Network issues related to certain incidents.
  • the embodiment of the present application also provides a data processing device, including: a memory, a processor, and a communication component; the memory is used to store a computer program; the processor is coupled with the memory and is used to execute the computer program for: through the communication component Receive event messages sent by network switching equipment and their corresponding event metadata.
  • Event messages are messages in which a set event occurs in the data stream passing through the network switching equipment; extract from event messages and their corresponding event metadata
  • Event information event information is used to describe related information about the occurrence of a set event; save the event information, and provide query operations for the network administrator, so that the network administrator can locate network problems related to the set event.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program.
  • the processor When the computer program is executed by a processor, the processor is caused to implement the steps in the information processing method that can be executed by the data processing device provided in the embodiment of the present application. .
  • the embodiment of the present application also provides a configuration method, which is suitable for a network switching device.
  • the network switching device includes a programmable data plane.
  • the method includes: in response to a configuration operation, obtaining a configuration file required by the programmable data plane;
  • the file is configured to the programmable data plane to complete the configuration operation; among them, the programmable data plane is configured to: select the event message where the set event occurs from the data stream passing through the network switching device; based on the event message Provide event information to the data processing terminal.
  • the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event.
  • the embodiment of the application also provides a data center system, including: multiple servers, multiple network switching equipment, and data processing equipment; multiple servers and data processing equipment are respectively connected to multiple network switching equipment in communication; multiple network switching equipment At least part of the network switching equipment includes a programmable data plane, and the programmable data plane is programmed, which can be used to select the event report that has a set event from the data flow passing through the network switching device to which the programmable data plane belongs.
  • Text Provide event information to the data processing terminal based on the event message.
  • the event information is used to describe related information about the occurrence of a set event, which can be used to locate network problems related to the set event; data processing equipment is used to obtain programmable data
  • the event information provided by the plane saves the event information and provides query operations to the network administrator for the network administrator to locate network problems related to the set event.
  • the network switching device has a programmable data plane, and the programmability of the data plane is used to enable the data plane to accurately and timely select event messages, and accurately and quickly process data based on the event messages.
  • the terminal reports event information, and the data processing terminal saves event information. Based on the event information, it provides query operations for network administrators, which provides a basis for network administrators to locate network problems accurately and quickly, and can solve the problem of poor positioning accuracy and speed. Slow and other issues.
  • Fig. 1a is a schematic structural diagram of a data center system provided by an exemplary embodiment of this application;
  • FIG. 1b is a schematic structural diagram of another data center system provided by an exemplary embodiment of this application.
  • FIG. 1c is a schematic diagram of the principle of detecting a link packet loss event provided by an exemplary embodiment of this application;
  • FIG. 1d is a schematic diagram of a working principle of a programmable data plane provided by an exemplary embodiment of this application;
  • FIG. 1e is a schematic diagram of an event stack structure provided by an exemplary embodiment of this application and event information being accessed to the event stack;
  • FIG. 1f is a schematic diagram of another working principle of a programmable data plane provided by an exemplary embodiment of this application.
  • Fig. 2a is a schematic structural diagram of a network switching device provided by an exemplary embodiment of this application.
  • FIG. 2b is a schematic structural diagram of another network switching device provided by an exemplary embodiment of this application.
  • FIG. 3a is a schematic flowchart of a configuration method provided by an exemplary embodiment of this application.
  • FIG. 3b is a schematic flowchart of an information processing method provided by an exemplary embodiment of this application.
  • FIG. 4a is a schematic flowchart of another information processing method provided by an exemplary embodiment of this application.
  • FIG. 4b is a schematic flowchart of yet another information processing method provided by an exemplary embodiment of this application.
  • Fig. 5a is a schematic structural diagram of a data processing device provided by an exemplary embodiment of this application.
  • Fig. 5b is a schematic structural diagram of another data processing device provided by an exemplary embodiment of this application.
  • An embodiment of the present application provides a network system, which includes: multiple network devices, multiple network switching devices, and data processing devices. Multiple network devices are in communication connection with multiple network switching devices, and multiple network switching devices are in communication connection with data processing devices. Of course, multiple network devices can also communicate directly or indirectly, and multiple network switching devices can also communicate directly or indirectly. The communication connection between these devices can be wired or wireless.
  • the implementation form of the network device is not limited. It can be any computer device that can access the network system.
  • it can be a terminal device such as a smart phone, a tablet computer, a personal computer, a notebook computer, an IoT device, or the like.
  • It is server equipment such as traditional servers, cloud servers, server arrays, cabinets, and mainframes.
  • the implementation form of the network switching device is not limited, and it can be any device with functions such as device interconnection, data exchange, and forwarding, such as a switch, a router, or a hub.
  • the implementation form of the data processing device is not limited. It can be any device with communication and data processing capabilities, such as a smart phone, a tablet computer, a personal computer, or a notebook computer. It can be server equipment such as traditional servers, cloud servers, server arrays, cabinets, and mainframes.
  • one or more network devices in the network system can be used as the data processing device in this embodiment; of course, the data processing device can also be separately deployed in the network system, which is not limited.
  • the network switching device has a control plane and a data plane.
  • the data plane of at least part of the network switching equipment can be programmed, that is, at least part of the network switching equipment in the network system has a programmable data plane. Utilizing the programmability of the data plane enables the data plane to accurately and timely select event messages, and accurately and quickly report event information to the data processing equipment based on the event messages; accordingly, the data processing equipment can save the event information, Based on event information, it provides query operations for network administrators, which provides a basis for network administrators to locate network problems accurately and quickly, and can solve problems such as poor positioning accuracy and slow speed of network problems.
  • the implementation form of the network system is not limited.
  • the network system can be implemented as a metropolitan area network, a local area network, an enterprise network or a campus network, etc., can also be implemented as a data center, cluster, or computer room, etc., or can also be implemented as a cloud such as a public cloud, a private cloud, an edge cloud, or a hybrid cloud.
  • a data center is taken as an example to illustrate a network system.
  • the network system shown in Fig. 1a can be referred to as a data center system.
  • the data center system includes: multiple servers 11, multiple network switching devices 12, and data processing devices 13.
  • the server 11 is mainly responsible for performing various computing tasks and can be considered as an end-side device.
  • the server 11 is only an example of an end-side device and is not limited to this; the main function of the network switching device 12 is to realize the interconnection between the servers 11 , Can be considered as a network side device.
  • Multiple servers 11 are interconnected through multiple network switching devices 12, and network data (for example, various messages) between the servers 11 can be forwarded through the network switching device 12.
  • a server 11 can directly communicate with one, two, or more than two network switching devices 12, or directly communicate with other servers 11, and use other servers 11 as relays. It is connected to one, two or more network switching devices 12 in communication.
  • the communication connection here can be a wired connection or a wireless connection.
  • the number of servers 11 and network switching devices 12 is not limited, and can be determined by the scale of the data center system.
  • a single cluster may contain thousands of network switching equipment, tens of thousands of servers, and hundreds of thousands of photoelectric lines.
  • the implementation form of the network switching device 12 is not limited.
  • it may include a router, a switch, or a hub.
  • the network switching device 12 includes a switch and a router, but it is not limited thereto.
  • each network switching device 12 has a control plane and a data plane, and the control plane and the data plane are separated.
  • the control plane is equivalent to the brain of the network switching device 12 and runs on a certain hardware structure (such as a processor, chip, or board card, etc.) to implement the control logic of the network switching device 12.
  • the data plane mainly implements the data exchange function of the network switching device 12, and also runs on a certain hardware structure (for example, a chip, a board card, or a line card, etc.).
  • the control plane has programmability, which is the same as or similar to the prior art, and will not be repeated here.
  • At least some of the data planes of the network switching devices 12 are programmable, that is, among the multiple network switching devices 12, at least some of the network switching devices 12 have programmable data. flat. There are two situations where at least part of the network switching device 12 has a programmable data plane:
  • Case 1 Among the multiple network switching devices 12, all the network switching devices 12 have a programmable data plane.
  • Case 2 Among the multiple network switching devices 12, some of the network switching devices 12 have a programmable data plane, and some of the network switching devices 12 have a non-programmable data plane.
  • the non-programmable data plane means that the functions that the data plane can achieve are solidified and cannot be changed by network users.
  • Programmable data plane means that the functions that can be realized by the data plane are programmable. Network users can customize the functions of the data plane according to their own application requirements, and realize network data processing procedures that are independent of the protocol.
  • FIG. 1a In the data center system shown in Fig. 1a, take Case 2 as an example, that is, some network switching equipment 12 has a programmable data plane, and some network switching equipment 12 has a non-programmable data plane as an example. Among them, a data center system in which all network switching devices 12 are equipped with a programmable data plane is shown in FIG. 1b.
  • a network switching device 12 with a programmable data plane use the programmability of the data plane to program the data plane so that the data plane at least realizes the following functions:
  • an event message in which a set event occurs is selected; based on the event message, the data processing device 13 is provided with event information.
  • the data flow passing through a network switching device 12 refers to a collection of various messages that are sequentially sent from one server 11 to another server 11 via the network switching device 12 during a communication process.
  • the programmable data plane can identify the set event that occurs in each data stream, and can select the event message in which the set event occurs.
  • the event message is a message in which a set event occurs in the data stream, or a message in which a set event is encountered in the data stream.
  • the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event (for example, fault location or equipment).
  • the content of the event information is not limited, and all relevant information that can describe the occurrence of a set event is applicable to the embodiment of this application.
  • the event information may include at least one of the following: the type of the set event (reflecting which type of event occurred), the detailed information of the occurrence of the set event (reflecting the detailed information of the occurrence of the set event), and the occurrence of the set event Data stream information (reflecting which data stream has set events).
  • the data flow information of the occurrence of the set event may be any information that can reflect the data flow of the occurrence of the set event, for example, it may be information such as a five-tuple or a two-tuple of the message.
  • the detailed information of the setting event includes, but is not limited to: the reason for the setting event, the location where the setting event occurs (such as port, queue, etc.), the result caused after the setting event occurs, and the time when the setting event occurs. Depending on the type of the event, the detailed information of the set event will be different, please refer to the example below.
  • the setting event is not limited. It can be any event related to a network failure, and can be flexibly set according to factors such as monitoring requirements, system characteristics, and application characteristics in the system.
  • the set event in the embodiment of the present application may include, but is not limited to: at least one of a congestion event, a pause event, a packet loss event, and a switching event.
  • the setting event is different, the way that the programmable data plane selects the event message of the setting event from the data stream will also be different, correspondingly, the detailed information of the setting event and the event information corresponding to the setting event It will be different. The following will exemplify the way of selecting event messages and the corresponding event information in conjunction with the definitions of several setting events.
  • Congestion is more common in data center systems and other networks.
  • applications such as MapReduce (mapping/reduce) are deployed on the server 11 in the data center system.
  • MapReduce mapping/reduce
  • These applications generate a type of traffic characteristic called incast, that is, multiple sender servers send to the same receiver server at the same time data.
  • incast a type of traffic characteristic
  • the port of the network switching device 12 that is responsible for forwarding data to the receiver server will experience queue accumulation, and the messages queued therein will experience queuing delay, resulting in congestion.
  • the data center system adopts an unfair load balancing strategy, it may also cause congestion.
  • the network switching device 12 has multiple ingress ports and multiple egress ports.
  • a message enters the network switching device 12 from one ingress port, and the internal message of the network switching device 12 is exchanged to an egress port, and from the outgoing port, The message is sent to the network switching device 12.
  • the programmable data plane can determine whether the queuing delay of the packets in the queue corresponding to each egress port exceeds the set delay threshold for each egress port of the network switching device 12, or determine the queue corresponding to each egress port Whether the length exceeds the set length threshold. If the judgment result is yes, it is determined that a congestion event has occurred on the egress port, and accordingly, the message in the queue corresponding to the egress port is the event message of the congestion event.
  • the detailed information can include, but is not limited to: the information of the network switching device where the congestion occurs (such as IP address), the information of the congested port (such as port number), the information of the queue where the congestion occurs (such as the queue number), and The approximate time of congestion, and the queuing delay or queue length of the message; accordingly, the corresponding event information can include but is not limited to the following information: information (such as IP address)/port information (such as Port number)/queue information (such as queue number), data flow information that encounters congestion (such as quintuples or other characteristics that can be used to identify a data flow), the approximate time when congestion occurs, and the queuing delay of packets or Queue length, etc.
  • Pause event If the data center system is a lossless network, in the lossless network, for any network switching device, if the length of the queue corresponding to its outgoing port exceeds the set length threshold, the network switching device can Send PFC (English: Priority Flow Control) flow control packets to its upstream device (that is, the device that sends packets to the outgoing port of the network switching device), so that the upstream device suspends sending packets until the outgoing port corresponds
  • PFC International: Priority Flow Control
  • the queue will not be emptied, and the packets in the upstream device will continue to be suspended. Once it deteriorates, it may cause a PFC storm and traffic outages across the entire network.
  • the programmability of the data plane can be used to enable the data plane to autonomously identify the suspended event message and provide the data processing device 13 with event information corresponding to the suspended event, so as to detect PFC storms in a timely and accurate manner. , Traffic suspension and other issues.
  • the data plane can record whether each outgoing port or queue in the network switching device is in the paused sending state. Based on this, for the received message, if the message needs to be routed to a certain egress port (referred to as the target egress port) of the network switching device, the programmable data plane can detect the target egress port or its corresponding queue Whether it is in the suspended sending state; if so, it is determined that a suspended event has occurred, and the message is an event message that has experienced a suspended event.
  • the target egress port egress port
  • the corresponding event information may include but not limited to the following information: information (such as IP address)/port information (such as port number) of the network switching device where the pause event occurred/queue information (such as queue Number), the data stream information (such as a five-tuple) of the pause event, and the time when the pause event is encountered.
  • Packet loss (Drop) events In the data plane, packets may be discarded for many reasons, such as packet loss due to congestion, packet loss on the pipeline, and silent packet loss on the link. Packet loss may cause a sharp drop in application performance and cause losses. Packet loss events in the data plane can be roughly divided into device packet loss and link packet loss. For device packet loss events, the data plane can determine whether its pipeline has made a discard decision on the packet, or whether the queue corresponding to the egress port discards the packet due to congestion. If yes, it is determined that a device packet loss event (such as a pipeline packet loss event or a congestion packet loss event) has occurred, and the discarded packet is an event packet in which the device packet loss event occurred.
  • a device packet loss event such as a pipeline packet loss event or a congestion packet loss event
  • link packet loss events are often difficult to detect. The reasons are as follows: due to problems such as link failure, broken, defaced, loose joints, etc., packets may experience bit flip events on the link, that is, partial bit transmission An error occurs, so that the packets arriving at the opposite end of the link cannot pass the packet format check and will be directly discarded; in addition, because the received packets are wrong, it is impossible to identify which data flow experienced the link loss. Package event.
  • the "link" in the link packet loss event includes all modules and connection lines passing between the data planes of the two upstream and downstream network switching devices.
  • the number of correct packets sent by the upstream network switching device and the number of correct packets received by the downstream network switching device are theoretically equal and should be consistent.
  • a method for link packet loss detection based on message numbers is proposed. As shown in Figure 1c, taking the upstream network switching device A sending a packet to the downstream network switching device B as an example, the method includes the following operations:
  • Step 1 Before network switching device A sends out the message to be sent, its programmable data plane first numbers the message to be sent, and locally buffers the number of each message to be sent and the data to which it belongs Stream information.
  • the network switching device A maintains a ring buffer locally, which is used to buffer the number of the message to be sent and the data flow information to which it belongs, and records the number of messages that have been buffered through a counter. number. Due to the limited size of the ring buffer, when the space of the ring buffer is used up, it will be replaced sequentially from the beginning. It should be noted that the use of the ring buffer area to buffer the message number and data flow information is only an exemplary implementation, and is not limited to this buffering method. For example, a non-circular buffer area can also be used to buffer message numbers and data flow information.
  • Step 2 The network switching device A sends the message carrying the number to the network switching device B at the opposite end.
  • Step 3 The network switching device B performs packet loss detection, that is, checks whether the numbers of the received correct packets are continuous; if they are not continuous, it is considered that a link packet loss event has occurred.
  • network switching device A sends packets numbered 10-16 to network switching device B in turn, but network switching device B receives packets numbered 10-12 and 14-16, but does not receive the numbers The message is 13.
  • the network switching device B receives the packet numbered 14, it can determine that the packet numbered 13 has been lost, and that the link packet loss event has occurred.
  • Step 4 The network switching device B sends a packet loss notification message to the network switching device A, and the packet loss notification message carries the number of the missing packet.
  • Step 5 After the network switching device A receives the packet loss notification message, it searches the locally buffered message number to determine the event message and its data flow information for the link packet loss event.
  • the network switching device A can also determine the event packet of the link packet loss and its corresponding according to the number of the missing packet carried in the packet loss notification message, combined with the information recorded in the ring buffer area. Data flow information.
  • the detailed information can include, but is not limited to: information about the network switching device where the packet was lost (such as IP address) information about the port where the packet was lost (such as port number), and information about the queue where the packet was lost (such as queue number) )), the approximate time when the packet loss occurred, and the reason for the packet loss; accordingly, the corresponding event information may include but is not limited to the following information: information (such as IP address)/port of the network switching device where the packet loss event occurred Information (such as port number)/queue information (such as queue number), link information where packet loss occurred, data flow information where packet loss occurred, time of packet loss event, reason for packet loss, etc.
  • information about the network switching device where the packet was lost such as IP address
  • information about the port where the packet was lost such as port number
  • information about the queue where the packet was lost such as queue number
  • the corresponding event information may include but is not limited to the following information: information (such as IP address)/port of the network switching device where the packet loss event occurred Information (such as port
  • the data center system In order to ensure the high reliability of the data center system, the data center system usually has high redundancy, and there may be multiple equal-cost paths between the two servers 11. In order to make full use of the bandwidth of redundant links, load balancing algorithms such as Equal Cost Multiple Path (ECMP) are running in the data center system to distribute traffic to multiple paths. However, in the event of link failure, network switching equipment failure, protocol failure, etc., one or more links cannot be used normally. The load balancing algorithm redistributes the data flow to the new path. Or, the normal convergence of routing and switching protocols (such as BGP, OSPF, IS-IS) running in the data center system, or the convergence caused by abnormal conditions such as link failures, network switching equipment failures, etc., will also restart the data flow. Assign to the new path. These situations are called switchovers, and timely and accurate capture of switchover events helps to quickly diagnose network faults.
  • ECMP Equal Cost Multiple Path
  • the programmable data plane can detect whether the data stream information to which the message to be sent belongs for the first time for each message to be sent, and the data plane can be programmed to learn from the switching equipment of the network. Whether the data stream is a new stream. If it is a new stream, it is determined that the data stream is switched from another path, and it is considered that a switching event has occurred. Of course, for the data stream that appears in the data center system for the first time, in this embodiment, it will also be classified as the category where the switching event occurs.
  • the detailed information can include but is not limited to: information of the network switching device where the switch occurred (such as IP address) information about the port where the switch occurred (such as port number), and information about the queue where the switch occurred (such as queue number) )), the approximate time of the switchover and the new route after the switchover (equivalent to the result of the switchover event), etc.; accordingly, the corresponding event information may include but is not limited to the following information: network exchange for the switchover event Device information (such as IP address)/port information (such as port number)/queue information (such as queue number), data flow information of the switching event, time information of the switching event, and the new path after the switching, etc.
  • IP address information about the port where the switch occurred
  • queue number such as queue number
  • the corresponding event information may include but is not limited to the following information: network exchange for the switchover event Device information (such as IP address)/port information (such as port number)/queue information (such as queue number), data flow information of the switching event, time information of the switching
  • event metadata corresponding to the event messages can also be generated.
  • Event metadata is data describing a set event, including but not limited to: the type of the set event and the detailed information about the occurrence of the set event.
  • the programmable data plane can be based on the event message and its The corresponding event metadata provides event information to the data processing device 13.
  • the implementation of the programmable data plane to provide event information to the data processing device includes but not limited to the following two:
  • Method 1 The programmable data plane extracts event information from the event message and its corresponding event metadata, and provides the event information to the data processing device. In method 1, event information is provided directly to the data processing device 13.
  • Manner 2 The programmable data plane sends the event message and its corresponding event metadata to the data processing device, so that the data processing device can extract event information from the event message and its corresponding event metadata.
  • event information is provided to the data processing device 13 indirectly.
  • Event message selection The programmable data plane selects the event message in which a set event occurs from the data stream passing through the network switching device to which it belongs, and generates event metadata corresponding to the event message.
  • the packets that encounter events account for only a small part.
  • the selection of event packets can greatly reduce the network traffic that needs to be monitored. Compared with copying the full amount of packets, the overhead can be Reduce by one to two orders of magnitude.
  • events E1 and E4 occurred in data streams s1 and s2; events E1, E4, and E5 occurred in data stream s3; event E2 occurred in data stream s4; events E2 and E4 occurred in data stream s5; events E2 and E4 occurred in data stream s6 and s7 Events E3 and E5 have been completed.
  • the method of selecting the event message that occurs the setting event will also be different.
  • the manner of selecting event messages for congestion events, pause events, packet loss events, and switching events can be referred to the foregoing embodiments, and will not be repeated here.
  • Event message de-redundancy The programmable data plane performs de-redundancy processing on the event message to obtain the de-redundant target event message and its corresponding event metadata.
  • the selected event message may contain multiple event messages under the same data stream.
  • event E1 two event messages under data stream s1 are selected, and three event messages under data stream s3 are selected; for event E2, two event messages under data stream s4 are selected.
  • the report of an event only needs to include the detailed information and data flow information of the event, and it has no inevitable relationship with the number of event messages.
  • the event message is processed to remove redundancy.
  • This method can further reduce event reporting traffic while ensuring event coverage, saving traffic transmission, processing, and storage overhead.
  • the method used for de-redundancy of the event message is not limited, and an example is described below.
  • a hash-based de-duplication method can be used, that is, for each event, the event message or the header of the event message or the flow information in the message header in which the event occurs is hashed to obtain Hash value, remove event messages with the same hash value.
  • Bloom Filter technology can be used.
  • a deduplication method based on exact matching can also be used, that is, for each event, accurately learn and record the data flow information to which the event packet that occurred the event belongs, and discard subsequent event packets that belong to these data flow information. , To achieve the purpose of de-redundancy.
  • the constraint condition for de-redundancy of event messages can also be set as: false negatives are 0, that is, there is at least one event message in all data streams that have elapsed time, and at the same time, false positives are minimized, that is, to remove as much as possible.
  • the message redundancy of the data stream can also be set as: false negatives are 0, that is, there is at least one event message in all data streams that have elapsed time, and at the same time, false positives are minimized, that is, to remove as much as possible.
  • the message redundancy of the data stream the embodiment of the present application provides a new type of hierarchical group voting deduplication method.
  • the programmable data plane maintains an information table, which is called the first information table in order to facilitate the distinction; alternatively, the first information table may be a hash table, an exact match table, a linked list, Various data structures such as hashing are implemented. Each entry in the first information table is used to record a piece of data flow information and the
  • the hash value of the data flow information to which the event message belongs is calculated, and the hash value is used as an index to perform matching in the first information table. If the target entry corresponding to the hash value is not matched, the event message is regarded as the target event message, and the data flow information to which the event message belongs is recorded in an empty entry, and the data flow is started The number of event messages corresponding to the information is counted.
  • the data flow information recorded in the target entry is compared with the data flow information to which the event message belongs. If the data flow information recorded in the target entry is the same as the data flow information to which the event message belongs, the number of event messages corresponding to the target entry is increased by 1, and the event message is discarded. If the data flow information recorded in the target entry is not the same as the data flow information to which the event message belongs, the number of event messages corresponding to the target entry is reduced by 1; and it is judged whether the number of event messages after subtracting 1 is 0 .
  • the event message is regarded as the target event message, and the data flow information recorded in the target entry is replaced with the data flow information to which the event message belongs, and the data flow information is re-corrected
  • the number of event messages is counted. If the number of event packets after subtracting 1 is not 0, the event packets are discarded.
  • the probability of the big data stream remaining in the first information item can be increased, no false negatives can be ensured, and false positives can be reduced as much as possible.
  • the above method can also adopt a multi-stage series connection.
  • different hashing algorithms can be used for different levels. The same data stream information will be hashed into different entries at different levels, thereby further reducing false positives.
  • the above method can be split into multi-stage pipelines in the data plane for implementation.
  • the first-stage pipeline it is possible to confirm whether the IP address part in the data flow information is the same; in the second-stage pipeline, to confirm whether the port part in the data flow information is the same; and so on.
  • the above scheme can be further divided into finer granularity. For example, in the first-stage pipeline, only the source IP address in the data flow information is confirmed to be the same, and in the second-stage pipeline, only the destination IP address in the data flow information is confirmed to be the same. , Confirm whether the port part in the data flow information is the same in the third-stage pipeline, etc.
  • the specific splitting method depends on the resource conditions of all levels of pipelines in the network switching equipment.
  • Event information extraction The programmable data plane extracts event information from the target event message and its corresponding event metadata.
  • Event messages and their corresponding event metadata contain event-related information, so event information can be extracted from event messages and their corresponding event metadata, that is, part of event information comes from event metadata, and part of event information
  • the information comes from the event message.
  • An event message includes a lot of information, such as the message header and message payload. There are some information that has nothing to do with the event. For example, the message header includes other information and message payload except the information that can identify the data flow. It is useless information.
  • partial event information related to the event is extracted from the event message, and the size of the extracted event-related information (for example, 20 bytes) is much smaller than the event message, which can further reduce the event reporting traffic.
  • the information in the event metadata can be all reported as event information, or part of it can be selected to be reported, which is not limited. As shown in Figure 1d, after the event information extraction operation, the event information corresponding to the events E1-E5 can be obtained, and the size of these event information is significantly smaller than the size of the event messages under the events E1-E5 and the corresponding event metadata.
  • the programmable data plane may maintain an event stack (Event Stack) for temporarily storing event information.
  • Event Stack includes two parts: a Stack Top Counter and event storage.
  • the stack top counter is used to record the number of event information temporarily stored in the event stack.
  • Event storage is used to store event information.
  • the event store may include one or more stack blocks. Based on the event stack, after the event information is extracted, the event information can be stored in at least one stack block in the event stack.
  • the storage bit width may be different, so the maximum storage capacity of the stack block in the event stack maintained by different programmable data planes will be different. However, for the same programmable data plane, the maximum storage capacity (that is, the maximum bit width) of each stack block is generally the same.
  • the event information can be completely stored in one stack block. If the size of the event information is greater than the maximum bit width of the stack block, the event information can be split into multiple information blocks, and the multiple information blocks can be stored in multiple stack blocks. The size of each information block is less than or equal to the size of the stack block. Maximum bit width.
  • the event information can be split into 3 information blocks, each of which has a size of 64 bits ( That is, 8 bytes), 64 bits and 32 bits (ie 4 bytes), and then 3 information blocks are stored in the 3 stack blocks shown in Figure 1e.
  • the third stack block only occupies 32 bits, and 32 bits of free space.
  • the 20-byte event information can also be split into five 32-bit information blocks, and the five information blocks can be stored in five stack blocks.
  • the event message can be processed accordingly. For example, event messages that experience congestion events or switching events can be forwarded out of the network switching device, and event messages that experience packet loss events or pause events can be discarded after the event information is extracted. For a pause event, a copy of the message in which the pause event occurred can be used as an event message. Since the event message is a duplicate message, discarding it will not affect the subsequent processing of the original message.
  • Event information batch processing splicing a specified number of event information into a data packet, and sending the data packet to the control plane of the network switching device or the data processing device.
  • the event information extraction operation removes the useless information in the event message, and the event information of each data stream is relatively small, which is beneficial to reduce storage overhead.
  • the event information receiver ie control plane or data processing equipment
  • the event information receiver ie control plane or data processing equipment
  • the event information receiver ie control plane or data processing equipment
  • Throughput is used to combine a specified number of event information in one data packet and send it, which can reduce the amount of data transmission and help improve the event information receiver (ie, control plane or data processing equipment).
  • the data packet is used as a carrier to trigger the operation of the top element of the stack block pop (pop), and the event information at the top of the stack is extracted; the event information at the top of the stack is spliced with the carried event information. If at this time, the number of event information carried in the data packet reaches the specified number, the data packet is sent; then, the data packet is copied, and its content is cleared to start the next round of event information collection and splicing. If at this time, the number of event information carried by the data packet does not reach the specified number, the data packet is looped back to the event stack, and the event information at the top of the stack continues to be collected, until the number of event information carried by the data packet reaches the specified number .
  • the solid line shows the process of pushing event information into the stack block
  • the dotted line shows the process of splicing data packets from the stack top event information in the stack block.
  • Event information de-redundancy The control plane or data processing equipment performs de-redundancy processing on the event information in the data packet.
  • the manner of de-redundant processing of event information is not limited.
  • the control plane (ie, CPU) of the network switching device or the CPU of the data processing device may maintain a second information table, and the second information table records event information that has been sent to the data processing terminal.
  • the second information table can be implemented in multiple data structures such as a hash table, an exact matching table, a linked list, and a hash.
  • the specified number of event information can be parsed from the data packet; for each parsed event information, check whether there is a corresponding in the second information table Record; if it is, it indicates that the event information is redundant, and the event information can be discarded; if not, the event information is retained, and the event information that has not been discarded is recorded in the second information table.
  • the event information de-redundancy operation is performed by the control plane of the network switching device, the undiscarded event information may also be repackaged into a new data packet and sent to the data processing device.
  • the event information can be hashed, and the hash value can be compared with the hash value in the hash table; if the hash value already exists in the hash table, then It indicates that the event information is redundant and can be discarded; otherwise, it indicates that the event information needs to be retained.
  • the process can also include: (1-6) Traffic shaping operations, that is, the control plane (ie, CPU) of the network switching device can perform traffic shaping on the data packets to be reported to prevent unexpected information Generate a large amount of upload traffic to impact the network and data processing equipment.
  • One method of traffic shaping is: the control plane of the network switching device (that is, the CPU) first caches the data packets that need to be sent locally in the CPU, and then sends it to the data processing device at a relatively stable rate.
  • the control plane ie, CPU
  • a reliable connection is established between the control plane (ie, CPU) of the network switching device and the data processing device through a reliable transport layer protocol such as TCP.
  • a reliable transport layer protocol such as TCP.
  • the control plane (that is, the CPU) of the network switching device can be reliably connected via TCP or the like, and send the reshaped data packet to the data processing device.
  • the reliable transport layer protocol can realize the packet loss retransmission function, can ensure the integrity of the event information, and help ensure the accuracy of the network administrator in locating network problems based on the event information.
  • an unreliable connection is established between the control plane (ie, CPU) of the network switching device and the data processing device through an unreliable transport layer protocol such as UDP.
  • the control plane (ie, CPU) of the network switching device can send the reshaped data packet to the data processing device through an unreliable connection such as UDP.
  • UDP unreliable transport layer protocol
  • a method for checking data integrity is: the control plane (ie, CPU) of the network switching device adds a sequence number to each data packet of the transmitted event information, and After sending the data packet, buffer the data packet locally for a period of time; accordingly, after receiving the data packet, the data processing device can detect whether the serial number of the data packet is continuous; if the serial number of the received data packet is found to be the same as the previous received If the sequence numbers of the received data packets are not continuous, the sequence numbers of the missing data packets can be notified to the control plane (ie, CPU) of the network switching device, and the control plane (ie, CPU) of the network switching device will resend the data packet. In this way, by tracing the sequence number of the data packet, the packet loss problem can be solved, which helps to ensure the integrity of the event information.
  • Event information storage The data processing device 13 obtains the event information provided by the programmable data plane, saves the event information, for example, stores the event information in a database, and provides query operations for network administrators for network management The staff locates the network problem related to the set event.
  • the data processing device 13 may classify and store event information according to event types.
  • Information for each event includes: event type (such as congestion, pause, switchover, or packet loss), data flow information of the event, detailed information related to the event (such as cause, port/queue, time of occurrence, etc.), etc. .
  • event information will vary. The following is an example of event information corresponding to different events:
  • Congestion event switch, outgoing port, outgoing queue, flow identification (such as a 5-tuple composed of ⁇ source IP, destination IP, source port, destination port, protocol>, or a two-tuple composed of ⁇ source IP, destination IP>, etc. ), queuing delay, queue length, time stamp (indicating the time when the congestion event occurred);
  • flow identification such as a 5-tuple composed of ⁇ source IP, destination IP, source port, destination port, protocol>, or a two-tuple composed of ⁇ source IP, destination IP>, etc.
  • Pause event switch, ingress port, egress port, out queue, flow identification, time stamp (indicating the time when the pause event occurred);
  • Packet loss event packet loss location (such as switch pipeline, switch buffer or link), packet loss reason, flow identifier, time stamp (indicating the time when the packet loss event occurred);
  • Switching events switch, ingress port, egress port, out queue, flow identification, time stamp (indicating the time when the switching event occurred).
  • the data processing device 13 can provide network administrators with various dimensions of query operations, including but not limited to at least one of the following: data stream dimension query operations, event dimension query operations, and device dimensions Query operations and query operations in the time dimension.
  • the query operation of the data stream dimension refers to taking the specified data stream as the query object, and querying which events have occurred in the specified data stream at the specified time.
  • the query operation of the event dimension refers to taking the specified event as the query object, and querying the data streams where the specified event occurs at the specified time.
  • the query operation of the device dimension refers to the specified device as the query object to query which events have occurred on the specified device during the specified time period.
  • the query operation of the time dimension refers to the specified time as the query object to query which events have occurred in each data stream within the specified time.
  • these dimensions can also be aggregated in any way to form aggregate query dimensions.
  • Modification 1 The above operations (1-1)-(1-4) are implemented by a programmable data plane, and operations (1-5) and (1-7) are implemented by a data processing device.
  • Variation 2 The above operations (1-1)-(1-3) are implemented by a programmable data plane, and operations (1-5) and (1-7) are implemented by a data processing device.
  • Modification scheme 3 The above operations (1-1) and (1-3) are implemented by a programmable data plane, and operations (1-5) and (1-7) are implemented by a data processing device.
  • Event message selection The programmable data plane selects the event message in which a set event occurs from the data stream passing through the network switching device to which it belongs, and generates event metadata corresponding to the event message.
  • Event message de-redundancy The programmable data plane performs de-redundancy processing on the event message to obtain the de-redundant target event message and its corresponding event metadata, and combine the target event message with The corresponding event metadata is sent to the data processing device.
  • Event information extraction The data processing equipment extracts event information from the target event message and its corresponding event metadata.
  • Event information de-redundancy The data processing equipment performs de-redundancy processing on the event information in the data packet.
  • Event information storage The data processing equipment saves the event information, for example, stores the event information in a database, and provides query operations to the network administrator for the network administrator to locate network problems related to the set event.
  • operation (2-1) is the same as operation (1-1) in the embodiment shown in FIG.
  • the de-redundancy process for the event message in operation (1-2) is the same, so it will not be repeated here.
  • the principles of operations (2-3) and (2-4) are the same as those of operations (1-3) and (1-4) in the embodiment shown in FIG. 1d, with the difference: In the embodiment shown in FIG. 1d, these operations are implemented by the data plane (hardware), and in the embodiment shown in FIG. 1f, these operations are implemented by the data processing end (software). Therefore, the detailed implementation process will not be repeated here.
  • FIG. 1f it includes: (2-1) event message selection, (2-2) event message de-redundancy, (2-3) event information extraction, (2-4) ) Event information de-redundancy and (2-5) event information storage.
  • (2-2) event message de-redundancy and (2-4) event information de-redundancy are optional operations. These optional operations can be used alternatively or combined in any manner.
  • the (2-2) event message de-redundancy in addition to the (2-1) event message selection, the (2-2) event message de-redundancy can also be moved to the data processing device to realize the modification scheme 4.
  • Modification 4 The above operation (2-1) is implemented by a programmable data plane, and the operations (2-2)-(2-5) are implemented by a data processing device.
  • the above operation (2-1) can also be moved to a data processing device for implementation, that is, variant 5 is obtained, that is, the above (2-1)-(2-5) are all implemented by the data processing device.
  • the programmable data plane can report all the messages of the data stream (regardless of whether they experience events) to the data processing device, and the data processing device selects the event messages that have set events, and performs operations such as event information extraction. .
  • the location of the network problem can be associated with the event encountered by the data stream in the system, which provides an opportunity for quickly and accurately locating the network problem.
  • the data plane independently accurately and timely identifies event messages that encounter a set event in the data stream from the data stream.
  • this solution can continuously, concurrently, and monitor events encountered in the data stream in real time, including but not limited to: packet loss, congestion, path changes, pauses and other events, even including traditional methods that are difficult to diagnose The link silent packet loss event.
  • the functions of the programmable data plane are mainly introduced, and the implementation structure of the programmable data plane is not limited. Any implementation structure that can implement the various functions described in the foregoing embodiments is applicable to the embodiments of the present application.
  • Programmable data plane For example, the programmable data plane of the embodiment of the present application may adopt a pipeline structure. Of course, a non-pipelined structure can also be used. Furthermore, the pipeline structure of different manufacturers will have their own merits in specific implementation. A specific pipeline structure is given in the following embodiments of this application.
  • Fig. 2a is a schematic structural diagram of a network switching device provided by an exemplary embodiment of this application.
  • the network switching device 20 includes a control plane 21 and a programmable data plane 22.
  • the control plane 21 is separated from the programmable data plane 22, but they can communicate with each other.
  • the control plane 21 is equivalent to the brain of the network switching device, and is responsible for implementing the control logic of the network switching device, such as protocol message forwarding, protocol table entry calculation, maintenance, etc., all belong to the category of the control plane 21.
  • the programmable data plane 22 is responsible for the data exchange functions of the network switching device. For example, the reception, decapsulation, encapsulation, and forwarding of messages belong to the category of the programmable data plane 22.
  • the data plane 22 has programmability. Based on the programmability of the data plane 22, users are allowed to customize the functions of the data plane 22 according to their own application requirements.
  • the data plane 22 is programmed to have the following functions: it can select event messages that have a set event from the data stream passing through the network switching device 20; provide event information to the data processing terminal based on the event messages, The event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event.
  • the programmable data plane 22 can identify the set event that occurs in each data stream. , And you can select the event message in which the set event occurs.
  • the event message is a message in which a set event occurs in the data stream, or a message in which a set event is encountered in the data stream.
  • the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event (for example, fault location or equipment).
  • the content of the event information is not limited, and all relevant information that can describe the occurrence of a set event is applicable to the embodiment of this application.
  • the event information may include at least one of the following: the type of the set event, the detailed information of the set event, and the data flow information of the set event (.
  • the data flow information of the set event may be any information that can reflect the occurrence of the set event.
  • the information of the set event data stream for example, can be the 5-tuple or 2-tuple of the message.
  • the detailed information of the set event includes, but is not limited to: the cause of the set event and the location where the set event occurs (e.g. Port, queue, etc.), the result caused by the setting event, the time when the setting event occurs, etc. Depending on the type of event, the detailed information of the setting event will be different.
  • the setting event is not limited. It can be any event related to a network failure, and can be flexibly set according to factors such as monitoring requirements, system characteristics, and application characteristics in the system.
  • the set event in the embodiment of the present application may include, but is not limited to: at least one of a congestion event, a pause event, a packet loss event, and a switching event.
  • congestion events, pause events, packet loss events, and switching events please refer to the description in the foregoing system embodiment, which will not be repeated here.
  • the programmable data plane 22 has a pipeline structure. As shown in FIG. 2a, the programmable data plane 22 includes an ingress pipeline (ingress pipeline) 221, a cache management unit (Memory management unit, MMU) 222, and an egress pipeline (egress pipeline) 223 in sequence.
  • ingress pipeline Ingress pipeline
  • MMU cache management unit
  • egress pipeline egress pipeline
  • the input pipeline 221, the MMU 222, and the output pipeline 223 sequentially perform packet reception processing, packet exchange processing, and packet transmission processing on the data flow passing through the network switching device 20. That is, a message in a data stream first reaches the in-end pipeline 221, and the in-end pipeline 221 receives and processes the message; the reception process here includes but is not limited to: temporarily store the message in the in-end buffer, and perform processing on the message. Perform correctness check, look up the routing table for the message to determine the destination outgoing port corresponding to the message, etc.
  • MMU 222 mainly manages the buffer of the network switching device 20, manages the queues corresponding to each outgoing port of the network switching device 20 (a queue occupies part of the buffer area), and is responsible for copying packets from the incoming buffer to the queue corresponding to the target outgoing port In, wait.
  • the outgoing pipeline 223 is mainly responsible for sending out the messages in the corresponding queue of each outgoing port, and can also check the messages before sending out.
  • the input pipeline 221, the MMU 222, and the output pipeline 223 can be programmed to realize the event reporting function in addition to the traditional message processing function described above.
  • the ingress pipeline 221 is also used to select the event message in which the set event occurs during the process of receiving the message of the data stream passing through the network switching device 20, and to combine the selected event message and its corresponding
  • the event metadata is reported to the end pipeline 223;
  • the MMU 222 is also used to select the event message in which the set event has occurred during the message exchange processing of the data stream passing through the network switching device 20, and to select the selected event message
  • the message and its corresponding event metadata are reported to the end pipeline 223;
  • the output pipeline 223 is also used to select the event report that has a set event in the process of sending messages to the data stream passing through the network switching device 20 According to the local selected event message and its corresponding event metadata, the event message reported by the input pipeline 221 and MMU 222 and its corresponding event metadata, the event information is reported to the data processing end.
  • the setting events that occur during the message receiving process are the settings that occur during the message exchange process.
  • Events, and set events that occur during the message sending process may be different.
  • the set events include: congestion events, packet loss events, pause events, and switchover events
  • a pipeline packet loss event and/or a pause event may occur during the process of message receiving and processing.
  • Cache packet loss events may occur in the process of processing
  • congestion events, switching events, pipeline packet loss events, and/or link packet loss events may occur in the process of message sending and processing.
  • buffer packet loss events, link packet loss events, and pipeline packet loss events are all packet loss events.
  • the inbound pipeline 221 needs to select the event message that has the packet loss event and/or the pause event during the message receiving and processing process, and will select the event message that has the packet loss event and/or the pause event and the corresponding
  • the event metadata is reported to the end pipeline 223;
  • the MMU 222 needs to select the event message in which the buffer loss event occurs during the message exchange processing, and the event message in which the buffer loss event has occurred and the corresponding event metadata are selected Reporting is given to the end pipeline 223;
  • the outbound pipeline 223 needs to select the event messages in which the congestion event, the switching event, the pipeline loss event and/or the link packet loss event occurred during the message sending process, and then according to its own The selected event message and its corresponding event metadata and the received event message and its corresponding event metadata report event information to the data processing end.
  • the specific implementation structures of the input pipeline 221, MMU 222, and output pipeline 223 are not limited, and any implementation structure that can select a corresponding event message is applicable to the embodiments of the present application.
  • an exemplary implementation structure is provided for the end pipeline 221, the MMU 222, and the output pipeline 223 respectively.
  • an implementation structure of the inbound pipeline 221 includes an inbound event detection module 2211.
  • the incoming event detection module 2211 is mainly used to select the event message in which a set event occurs during the message receiving process, generate event metadata corresponding to the event message, and combine the event message and its corresponding event metadata The report gives the end pipeline 223.
  • the inbound pipeline 221 also includes some pipeline modules for receiving and processing messages, mainly including a table lookup module 2212 (Tables lookup) shown in FIG. 2b.
  • the table lookup module 2212 is mainly used to look up the routing table for each received message. If the routing information corresponding to the message is found, the target egress port corresponding to the message can be determined. After the target egress port corresponding to the message is determined, the message will be copied to the queue corresponding to the target egress port and wait for the egress pipeline 223 to send it out from the egress port.
  • the routing information corresponding to the message is not found, or the target outgoing port found is faulty, the message will be discarded (that is, a pipeline packet loss event occurs).
  • the working status of the outgoing port includes: normal sending status, suspended sending status, and fault status. If the working state of the target egress port is in the suspended sending state, it means that the message has encountered a suspend event and cannot be copied to the queue corresponding to the target egress port in time.
  • the inbound pipeline 221 may also include a verification module that performs various verifications such as the format of the received message; if the message fails the verification, it will be discarded (that is, a pipeline packet loss event occurs); If the message passes the check, the table lookup module 2212 will perform a table lookup for the message.
  • the check module used to check the message is an optional module, not a mandatory module.
  • the inbound event detection module 2211 may include at least one of an inbound pipeline loss detection module 202 and a pause event detection module 201.
  • the inbound pipeline packet loss detection module 202 is used to detect whether a pipeline packet loss event occurs in the process of message receiving and processing, and in the case of yes, generate event metadata, and a report of the pipeline packet loss event will occur.
  • the message is reported to the end pipeline 223 as an event message connected to the event metadata.
  • the process of message reception processing is executed by the pipeline (for example, the check module and the table lookup module 2212) in the input pipeline 221, so the packet loss event in the message reception process is called the pipeline packet loss event.
  • the process of message reception processing includes a process of checking the routing table for each received message, and the inbound pipeline packet loss detection module 202 may be specifically used to: detect that each received message is checked; Whether packet loss occurs in the process of routing table; if packet loss occurs in the process of checking the routing table, it is determined that a packet loss event occurs in the pipeline.
  • the process of message receiving and processing includes a process of checking the routing table for each received message and a process of verifying each received message, then the ingress pipeline packet loss detection module 202 Specifically, it can be used to detect whether packet loss occurs in the process of checking the routing table for each received message, and detect whether packet loss occurs during the verification process of each received message; if it is detected in any process Packet loss, determine the occurrence of a pipeline loss event.
  • the process of message reception processing includes a verification process for each received message, and the ingress pipeline packet loss detection module 202 may be specifically used to: Check whether packet loss occurs during the verification process; if a packet loss occurs during the verification process, it is determined that a pipeline loss event occurs.
  • the pause event detection module 201 is used to detect whether a pause event occurs in the process of message receiving and processing, and in the case of yes, generate event metadata, and treat the message with the pause event as the event message together with the event
  • the metadata is also reported to the end pipeline 223.
  • a copy of the message in which the pause event occurs can be used as an event message to reduce the impact of event reporting on subsequent message processing. If the working state of a certain outgoing port in the network switching device 20 is in a sending-suspended state, and there are messages in the received messages that need to be routed to the outgoing port, it is considered that a suspension event has occurred.
  • the pause event detection module 201 is specifically configured to detect whether the target egress port is in a transmission pause state when the received message needs to be routed to the target egress port, and if so, determine that a pause event has occurred.
  • the inbound pipeline 221 also includes an inbound link packet loss detection module 2213, which is used to detect whether a link packet loss event occurs in the data flow passing through the network switching device 20, and if it is a yes Next, send a packet loss notification message to the upstream device to notify the upstream device that a link packet loss event has occurred.
  • an inbound link packet loss detection module 2213 which is used to detect whether a link packet loss event occurs in the data flow passing through the network switching device 20, and if it is a yes Next, send a packet loss notification message to the upstream device to notify the upstream device that a link packet loss event has occurred.
  • the detection of the link packet loss event can be implemented by the network switching device 20 and its upstream device in cooperation with each other. Specifically, before sending the message to the network switching device 20, the upstream device may add a number to the message, and locally buffer the message number and its data flow information for a period of time. For the network switching device 20, it will receive a numbered packet, and the inbound link packet loss detection module 2213 specifically detects whether the number of the packet from the upstream device is continuous to determine whether there is a loss on the link. Packet; if it is continuous, it is determined that no packet loss has occurred on the link; if it is not continuous, it is determined that packet loss has occurred on the link, that is, a link packet loss event has occurred.
  • the inbound link packet loss detection module 2213 For the inbound link packet loss detection module 2213, by comparing the numbers of the received messages, the number of the lost message can be obtained, but it is impossible to know who the lost message is, nor can it know the lost message.
  • the data flow information to which it belongs is only known by the upstream device. Therefore, the incoming link packet loss detection module 2213 can carry the number of the lost packet in the packet loss notification message when it determines that a link packet loss event has occurred. And report to the upstream device, so that the upstream device can not only determine that the link packet loss event has occurred, but also determine the event message of the link packet loss event and the data flow information to which it belongs, and then report the event to the data processing end .
  • an implementation structure of the MMU 222 includes: a buffer packet loss detection module 2221, which is used to detect whether a buffer packet loss event occurs in the process of buffering packets in the queue corresponding to each outgoing port. And in the case of yes, the event metadata is generated, and the message in which the buffer packet loss event has occurred is reported as the event message and connected to the event metadata to the end pipeline 223.
  • an implementation structure of the outbound pipeline 223 includes: an event message processing module 2232 and an outbound event detection module 2231.
  • the outgoing event detection module 2231 is mainly used to select the event message in which a set event occurs during the message sending process, and report the event message and its corresponding event metadata to the event message processing module 2232.
  • the event message processing module 2232 is used to receive events reported by the input pipeline 221 (specifically, each input event detection module 2211 in the input pipeline 221) and the MMU 222 (specifically, the buffer packet loss detection module 2221 in the MMU 222). Messages and their corresponding event metadata, and receive event messages and their corresponding event metadata sent by the outgoing event detection module 2231, and provide event information to the data processing end based on these event messages and their corresponding event metadata .
  • the outgoing pipeline 223 also includes some pipeline modules for sending and processing messages, such as a verification module that performs various verifications such as the format of the message to be sent; If the message fails the verification, it will be discarded (that is, a pipeline packet loss event occurs); if the message passes the verification, the message will be sent out.
  • the check module used to check the message is an optional module, not a mandatory module.
  • the message is buffered in the queue corresponding to the target egress port, it will wait to be sent. While waiting to be sent, packets may be lost due to congestion on the outbound port. Further, after the message is sent, link packet loss may also occur.
  • the message may also be redistributed to the link where the network switching device 20 is located due to the original link failure, that is, a switching event may also occur.
  • the outgoing event detection module 2231 may include: a congestion event detection module 203, a switching event detection module 204, At least one of the outgoing pipeline packet loss detection module 205 and the outgoing link packet loss detection module 206.
  • the congestion event detection module 203 is used to detect whether a congestion event occurs on each outlet port of the network switching device 20, and in the case of yes, generate event metadata, and treat the message with the congestion event as the event message together with the event metadata It is sent to the event message processing module 2232 together.
  • the congestion event detection module 203 is specifically configured to: for each egress port, determine whether the queuing delay of the packet in the queue corresponding to the egress port exceeds a set delay threshold, or determine the length of the queue corresponding to the egress port Whether it exceeds the set length threshold; if so, confirm that a congestion event has occurred on the outgoing port.
  • the packets queued on the outbound port are the event packets in which the congestion event occurs.
  • the switching event detection module 204 is used to detect whether a switching event occurs in the network switching device 20, and in the case of yes, generate event metadata, and use the message with the switching event as the event message together with the event metadata It is sent to the event message processing module 2232 together.
  • the switching event detection module 204 is specifically configured to: for each message to be sent, detect whether the data stream information (for example, a quintuple or a two-tuple) to which the message to be sent belongs is the first time Appears; if it is, confirm that a switching event has occurred.
  • the data stream information for example, a quintuple or a two-tuple
  • the outgoing pipeline packet loss detection module 205 is used to detect whether a pipeline packet loss event occurs during the message sending process, and in the case of yes, generate event metadata, and use the packet that has the pipeline packet loss event as The event message is sent to the event message processing module 2232 together with the event metadata.
  • the process of message sending processing includes: a process of verifying each message to be sent, and the outgoing pipeline packet loss detection module 205 is specifically configured to: Whether packet loss occurs in the process of verifying the packet, if so, determine whether a pipeline packet loss event occurs.
  • the outgoing link packet loss detection module 206 is used to detect whether a link packet loss event occurs during the message sending process, and in the case of yes, generate event metadata, and the link packet loss event will occur
  • the message is sent to the event message processing module 2232 as an event message together with event metadata.
  • the network switching device 20 may cooperate with its downstream devices to complete link packet loss detection.
  • the outbound link packet loss detection module 206 numbers each packet to be sent before sending each packet to be sent, so that the downstream device can assist in determining whether a link has occurred according to the packet number. Packet loss event; and to detect whether a packet loss notification message returned by the downstream device when it determines that a link packet loss event has occurred is received, and if so, it is determined that a link packet loss event has occurred.
  • the downstream device will receive the numbered message sent by the network switching device 20, and by judging whether the message number is continuous, it can be determined whether packet loss occurs on the link between it and the network switching device 20.
  • the downstream device may also carry the number of the missing packet in the packet loss notification message and provide it to the outgoing link packet loss detection module 206 in the network switching device 20.
  • the outbound link packet loss detection module 206 is also specifically used to: locally buffer the number of each message to be sent and the data flow information to which it belongs; and determine the number of missing messages carried in the packet loss notification message. The event message of the link packet loss and the data flow information to which it belongs.
  • the event message processing module 2232 is respectively connected with the inbound pipeline packet loss detection module 202, the pause event detection module 201, the buffer packet loss detection module 2221, the congestion event detection module 203, the switch event detection module 204, and the outbound pipeline packet loss detection module 202, respectively.
  • the end-pipeline packet loss detection module 205 and the outbound link packet loss detection module 206 are in communication connection.
  • the detection module 206 may send the selected event message and its corresponding event metadata to the event message processing module 2232 through an internal port.
  • the event message processing module 2232 is specifically configured to: send the received event message and its corresponding event metadata to the data processing end, so that the data processing end can read the event message and its corresponding The event information is extracted from the event metadata.
  • the event message processing module 2232 may directly send the received event message and its corresponding event metadata to the data processing end.
  • the event message processing module 2232 may perform de-redundancy processing on the received event message to obtain the target event message, and send the target event message and its corresponding event metadata to the data processing end.
  • de-redundant processing of event messages can further reduce the event reporting traffic while ensuring event coverage, saving traffic transmission, processing, and storage overhead.
  • de-redundancy processing please refer to the description in the following embodiments, which will not be described in detail temporarily.
  • the event message processing module 2232 is specifically configured to extract event information from the received event message and its corresponding event metadata, and provide the event information to the data processing terminal. Further optionally, the event message processing module 2232 may perform de-redundancy processing on the received event message to obtain the target event message; then, extract event information from the target event message and its corresponding event metadata, Provide event information to the data processing end. Among them, de-redundant processing of event messages can further reduce event reporting traffic while ensuring event coverage, saving traffic transmission, processing, and storage overhead.
  • the method used by the event message processing module 2232 to perform de-redundancy processing on the event message is not limited.
  • a hash-based de-duplication method or an exact-match-based de-duplication method may be used.
  • the method for de-duplication of hierarchical group voting provided in the embodiment of the present application is adopted.
  • the adopted deduplication method is different, the implementation structure of the event message processing module 2232 will be different.
  • taking the hierarchical group voting deduplication method provided in the embodiment of the present application as an example an implementation structure of the event message processing module 2232 is given.
  • the event message processing module 2232 can target a data stream to retain one event message, and perform de-redundancy processing on the received event message to obtain the target event message.
  • the event message processing module 2232 includes a de-redundancy sub-module and maintains a first information table; each entry in the first information table is used to record a piece of data flow information and the number of event messages corresponding to it.
  • the de-redundancy sub-module is used to: for each received event message, use the hash value of the data stream information to which the event message belongs as an index, and perform matching in the first information table; if there is no match Go to the corresponding target entry, use the event message as the target event message, and record the data flow information to which the event message belongs to an empty entry, and start counting the number of event messages; if it matches the corresponding If the data flow information recorded in the target entry is the same as the data flow information to which the event message belongs, the number of event messages corresponding to the target entry is increased by one; if the corresponding target entry is matched, However, the data flow information recorded in the target entry is not the same as the data flow information to which the event message belongs, then the number of event messages corresponding to the target entry is reduced by 1; and if the number of event messages after subtracting 1 is 0, The event message is taken as the target event message, the data flow information recorded in the target entry is replaced with the data flow information to which the event message belongs, and
  • the event message processing module 2232 also includes: an event extraction sub-module, an event stack, and a batch processing sub-module.
  • the event stack includes a stack top counter and at least one stack block.
  • the event extraction sub-module is used to extract event information from the target event message obtained by the de-redundancy sub-module and its corresponding event metadata, and store the event information in at least one stack block in the event stack.
  • the event extraction submodule is specifically configured to: when the size of the event information is greater than the maximum bit width of the stack block, split the event information into multiple information blocks, and store the multiple information blocks in multiple stack blocks; The size of each information block is less than or equal to the maximum bit width.
  • the stack top counter is used to record the number of event information temporarily stored in at least one stack block.
  • the batch processing sub-module is used to extract a specified number of event information from at least one stack block, splice the specified number of event information into a data packet, and provide the data packet to the data processing end.
  • the specified number can be flexibly set according to factors such as pipeline resources, bandwidth, and application scenarios of the data plane, and is not limited. For example, the specified number can be 5, 8, 10, etc.
  • the batch processing sub-module can directly send the data packet carrying event information to the data processing end; or, it can also report the data packet carrying event information to the control plane 21 of the network switching device 20, and the control plane 21 sends the data packet to the data processing end.
  • the control plane 21 can also report the data packet carrying event information to the control plane 21 of the network switching device 20, and the control plane 21 sends the data packet to the data processing end.
  • control plane 21 before the control plane 21 sends the data packet to the data processing end, it can also de-redundate the event information carried in the data packet, so as to further reduce the event reporting traffic and save traffic while ensuring event coverage. Transmission, processing, and storage overhead.
  • control plane 21 includes: a processor 211 and a memory 212; the memory 212 is used to store computer programs; the processor 211 executes the computer programs to perform de-redundancy processing on the event information in the data packet, Get a new data packet; and send the new data packet to the data processing end.
  • the processor 211 may maintain a second information table locally, and the second information table is used to record event information that has been sent to the data processing terminal. Based on this, the processor 211 is specifically configured to: parse out a specified number of event information from the received data packet; for each parsed event information, check whether there is a corresponding record in the second information table; if so, discard it The event information; and further, re-encapsulate the event information that has not been discarded into a new data packet. Further, the processor 211 is further configured to record the event information that has not been discarded in the second information table, so as to de-redundancy the event information that is subsequently received.
  • the processor 211 is also configured to perform traffic shaping on the new data packets sent to the network switching device, so as to prevent the emergent event information from generating a large amount of upload traffic to impact the network and the data processing end.
  • traffic shaping For related descriptions of de-redundancy and traffic shaping of event information, please refer to the foregoing system embodiment, which will not be repeated here.
  • Fig. 3a is a schematic flowchart of a configuration method provided by an exemplary embodiment of this application. This method is used to configure the network switching device provided in the above embodiment, and is mainly used to configure the function of the programmable data plane in the network switching device. As shown in Figure 3a, the method includes the following steps:
  • programmable data plane Configure the above configuration file in a programmable data plane to complete the configuration operation; wherein the programmable data plane is configured to: select event messages in which a set event occurs from a data stream passing through a network switching device ; Provide event information to the data processing terminal based on the event message; the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event.
  • various hardware programming voices can be used to generate configuration files required by the data plane.
  • various hardware programming voices can be used to generate configuration files required by the data plane.
  • P4 programming protocol-independent packet processor in English
  • the configuration file can be uploaded to the data plane through an interface supported by the data plane.
  • the data plane of the network switching device is programmable, and network users can customize the functions of the data plane according to their own application requirements to implement network data processing procedures that are independent of the protocol.
  • the functions of the data plane after being compiled reference may be made to the description of the foregoing embodiment, which will not be repeated here.
  • FIG. 3b is a schematic flowchart of an information processing method provided by an exemplary embodiment of this application. This method is applicable to the network switching device in the embodiment shown in FIGS. 2a-2b, and is specifically applicable to the programmable data plane in the network switching device, but is not limited to the programmable data plane in the foregoing embodiment. This method is also applicable to some non-programmable data planes that have the same or similar functions as the programmable data plane in the foregoing embodiments. As shown in Figure 3b, the method includes:
  • the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event.
  • the event information is used to describe related information about the occurrence of a set event, and can be used to locate network problems related to the set event (for example, fault location or equipment).
  • the content of the event information is not limited, and all relevant information that can describe the occurrence of a set event is applicable to the embodiment of this application.
  • the event information may include at least one of the following: the type of the set event, the detailed information of the set event, and the data stream information of the set event.
  • the data flow information of the occurrence of the set event may be any information that can reflect the data flow of the occurrence of the set event, for example, it may be information such as a five-tuple or a two-tuple of the message.
  • the detailed information of the setting event includes, but is not limited to: the reason for the setting event, the location where the setting event occurs (such as port, queue, etc.), the result caused after the setting event occurs, and the time when the setting event occurs. Depending on the type of event, the detailed information of the set event will be different.
  • the setting event is not limited. It can be any event related to a network failure, and can be flexibly set according to factors such as monitoring requirements, system characteristics, and application characteristics in the system.
  • the above-mentioned set event includes at least one of the following types: congestion event, suspension event, packet loss event, and routing event.
  • congestion event the congestion event
  • suspension event the packet loss event
  • routing event the definition and description of the congestion event, the pause event, the packet loss event, and the switching event, please refer to the foregoing embodiment, which will not be repeated here.
  • the foregoing selection of an event message in which a set event occurs from a data stream passing through a network switching device includes at least one of the following selection operations:
  • the event message in which the set event occurs is selected and the event metadata corresponding to the event message is generated.
  • selecting an event message in which a set event occurs includes at least one of the following operations:
  • Detect whether a pause event occurs in the process of message receiving and processing and if yes, treat the message with the pause event as an event message.
  • a copy of the message in which the pause event occurs can be used as an event message to reduce the impact of event reporting on subsequent message processing.
  • selecting the event message in which the set event occurs includes: detecting that the message is buffered in the queue corresponding to the multiple outgoing ports of the network switching device. Whether a buffer packet loss event occurs during the process of writing, if yes, the packet that has a buffer packet loss event is regarded as an event packet.
  • selecting an event message in which a set event occurs includes at least one of the following operations:
  • providing event information to the data processing terminal based on the event message includes: sending the event message and its corresponding event metadata to the data processing terminal, so that the data processing terminal can read the event message and its corresponding event metadata.
  • the event information is extracted from the corresponding event metadata; or, the event information is extracted from the event message and its corresponding event metadata, and the event information is provided to the data processing terminal.
  • the method before sending the event message and its corresponding event metadata to the data processing end, or before extracting the event information from the event message and its corresponding event metadata, the method further includes: The data stream retains an event message as a target, performs de-redundancy processing on the event message, and obtains the target event message.
  • performing de-redundancy processing on the event message to obtain the target event message includes: for each event message, using the hash value of the data stream information to which the event message belongs as an index, Match in the first information table; each entry in the first information table is used to record a piece of data flow information and the number of event packets corresponding to it; if the corresponding target entry is not matched, the event packet is taken as Target event message, and record the data stream information to which the event message belongs to an empty entry, and start counting the number of event messages; if it matches the corresponding target entry, and the data recorded in the target entry If the flow information is the same as the data flow information to which the event message belongs, the number of event messages corresponding to the target entry is increased by one; if the corresponding target entry is matched, the data flow information and the event report recorded in the target entry are If the data flow information to which the packets belong is not the same, the number of event packets corresponding to the target entry is reduced by 1; and if the number of event packets after
  • the method further includes: storing the event information in at least one stack block in the event stack.
  • providing the event information to the data processing end includes: extracting a specified number of event information from at least one stack block, splicing the specified number of event information into a data packet, and providing the data packet to the data processing end.
  • providing the data packet to the data processing end includes: the data plane directly sends the data packet to the data processing end; or the data plane reports the data packet to the control plane for the control plane to send the data packet To the data processing end.
  • the method further includes: the control plane performs de-redundancy processing on the event information in the data packet to obtain a new data packet; and the control plane sends the new data packet to the data processing end.
  • control plane performs de-redundancy processing on the event information in the data packet to obtain a new data packet, including: parsing a specified number of event information from the data packet; checking each event information that is parsed Whether there is a corresponding record in the second information table; if so, discard the event information; re-encapsulate the undiscarded event information into a new data packet; where the second information table records the event information that has been sent to the data processing terminal.
  • the method further includes: the control plane performs traffic shaping on the new data packet during the process of sending the new data packet.
  • the network switching device has a programmable data plane.
  • the programmability of the data plane is used to enable the data plane to accurately and timely select event messages, and to accurately and quickly send the event messages to the data processing terminal based on the event messages.
  • the event information is reported, and the data processing terminal saves the event information. Based on the event information, it provides query operations for network administrators, which provides a basis for network administrators to locate network problems accurately and quickly, and can solve the problem of poor positioning accuracy and slow speed And other issues.
  • FIG. 4a is a schematic flowchart of another information processing method provided by an exemplary embodiment of this application. This method is suitable for the data processing side. As shown in Figure 4a, the method includes:
  • the network switching device selects the event message from the data stream and how to extract the content of the event information from the event message, refer to the foregoing embodiment, which will not be described in detail in this embodiment.
  • the event information includes at least one of the following: the type of the set event, the detailed information of the set event, and the data stream information in which the set event occurs.
  • the aforementioned query operation includes at least one of the following: a query operation in a data stream dimension, a query operation in an event dimension, a query operation in a device dimension, and a query operation in a time dimension.
  • receiving event information sent by a network switching device includes: receiving a data packet sent by the network switching device; and parsing multiple event information from the data packet.
  • before saving the event information it further includes: de-redundancy processing on the event information. This can reduce the redundancy of event information and save storage resources.
  • the query operation is provided to the network administrator based on the event information, which provides a basis for the network administrator to locate network problems accurately and quickly, and can solve problems such as poor positioning accuracy and slow speed of network problems.
  • FIG. 4b is a schematic flowchart of yet another information processing method provided by an exemplary embodiment of this application. This method is suitable for the data processing side. As shown in Figure 4b, the method includes:
  • the method before the event information is extracted from the event message and its corresponding event metadata, the method further includes: taking a data stream to reserve an event message as a target, and performing de-redundancy processing on the event message , Get the target event message.
  • extracting event information from an event message and its corresponding event metadata is specifically: extracting event information from a target event message and its corresponding event metadata. Since the processing capability of the data processing terminal is relatively powerful, various methods can be used for de-redundancy processing, such as a deduplication method based on hash, a deduplication method based on precise matching, and so on.
  • the event information before saving the event information, it further includes: de-redundancy processing on the event information.
  • de-redundancy processing due to the relatively powerful processing capability of the data processing end, various methods can be used for de-redundancy processing, such as a de-duplication method based on hash, a de-duplication method based on precise matching, and so on.
  • the execution subject of each step of the method provided in the foregoing embodiment may be the same device, or different devices may also be the execution subject of the method.
  • the execution subject of steps 41b to 43b may be device A; for another example, the execution subject of steps 41b and 42b may be device A, and the execution subject of step 43b may be device B; and so on.
  • Fig. 5a is a schematic structural diagram of a data processing device provided by an exemplary embodiment of this application. As shown in Fig. 5a, the device includes: a memory 51a, a processor 52a, and a communication component 53a.
  • the memory 51a is used to store computer programs, and can be configured to store other various data to support operations on the data processing device. Examples of such data include instructions for any application or method operating on the data processing device, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 51a can be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic Disk Magnetic Disk or Optical Disk.
  • the processor 52a coupled with the memory 51a, is used to execute the computer program in the memory 51a to receive event information sent by the network switching device through the communication component 53a, and the event information is used to describe the data flow generation device passing through the network switching device. Relevant information of the set event; save the event information, and provide query operations to the network administrator, so that the network administrator can locate the network problem related to the set event.
  • the event information includes at least one of the following: the type of the set event, the detailed information of the set event, and the data stream information in which the set event occurs.
  • the type of the set event the detailed information of the set event
  • the data stream information in which the set event occurs.
  • the query operation includes at least one of the following: a query operation in a data stream dimension, a query operation in an event dimension, a query operation in a device dimension, and a query operation in a time dimension.
  • the processor 52a when the processor 52a receives the event information sent by the network switching device, it is specifically configured to: receive a data packet sent by the network switching device; and parse out multiple event information from the data packet.
  • the processor 52a By carrying multiple event information in a data packet, batch processing of event information can be realized, which is beneficial to reduce the amount of data transmission and is beneficial to improve the throughput of data processing equipment.
  • the processor 52a before saving the event information, is further configured to perform de-redundancy processing on the event information. This can reduce the redundancy of event information and save storage resources.
  • the data processing device further includes: a display 57a, a power supply component 58a, an audio component 59a and other components. Only part of the components are schematically shown in FIG. 5a, which does not mean that the data processing device only includes the components shown in FIG. 5a. In addition, the components in the dashed box in FIG. 5a are optional components, not mandatory components, and the specifics may depend on the product form of the data processing equipment.
  • the data processing device in this embodiment can be implemented as a terminal device such as a desktop computer, a notebook computer, and a smart phone, or can be a server device such as a conventional server, a cloud server, or a server array.
  • the data processing device of this embodiment is implemented as a terminal device such as a desktop computer, a notebook computer, a smart phone, etc., it may include the components in the dashed box in Figure 5a; if the data processing device of this embodiment is implemented as a conventional server, a cloud server or a server Server devices such as arrays may not include the components in the dashed box in Figure 5a.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program, which can implement each step in the method embodiment shown in FIG. 4a when the computer program is executed.
  • Fig. 5b is a schematic structural diagram of another data processing device provided by an exemplary embodiment of this application. As shown in Fig. 5b, the device includes: a memory 51b, a processor 52b, and a communication component 53b.
  • the memory 51b is used to store computer programs and can be configured to store various other data to support operations on the data processing device. Examples of such data include instructions for any application or method operating on the data processing device, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 51b can be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic Disk Magnetic Disk or Optical Disk.
  • the processor 52b is coupled with the memory 51b, and is configured to execute the computer program in the memory 51b to receive the event message sent by the network switching device and its corresponding event metadata through the communication component 53b, and the event message passes through the network.
  • the message of a set event occurs in the data stream of the switching device; the event information is extracted from the event message and its corresponding event metadata, and the event information is used to describe the related information of the set event; save the event information and face the network
  • the administrator provides query operations for the network administrator to locate network problems related to the set event.
  • the query operation includes at least one of the following: a query operation in a data stream dimension, a query operation in an event dimension, a query operation in a device dimension, and a query operation in a time dimension.
  • the processor 52b before extracting the event information from the event message, is further configured to: take a data stream to reserve an event message as a target, and perform de-redundancy processing on the event message to obtain the target Event message.
  • the processor 52b when the processor 52b extracts event information from the event message and its corresponding event metadata, it is specifically configured to: extract the event information from the target event message and its corresponding event metadata.
  • the processor 52b before saving the event information, is further configured to perform de-redundancy processing on the event information.
  • the data processing device further includes: a display 57b, a power supply component 58b, an audio component 59b and other components. Only part of the components are schematically shown in FIG. 5b, which does not mean that the data processing device only includes the components shown in FIG. 5b. In addition, the components in the dashed box in FIG. 5b are optional components, not mandatory components, and the specifics may depend on the product form of the data processing equipment.
  • the data processing device in this embodiment can be implemented as a terminal device such as a desktop computer, a notebook computer, and a smart phone, or can be a server device such as a conventional server, a cloud server, or a server array.
  • the data processing device of this embodiment is implemented as a terminal device such as a desktop computer, a notebook computer, a smart phone, etc., it may include the components in the dashed box in Figure 5b; if the data processing device of this embodiment is implemented as a conventional server, a cloud server or a server Server devices such as arrays may not include the components in the dashed box in Figure 5b.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program, which can implement each step in the method embodiment shown in FIG. 4b when the computer program is executed.
  • the communication components in Figs. 5a and 5b described above are configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination of them.
  • the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency selection (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency selection
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the above-mentioned display in Figs. 5a and 5b includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the power supply components in Figures 5a and 5b above provide power for various components of the equipment where the power supply components are located.
  • the power supply component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device where the power supply component is located.
  • the audio components in Figs. 5a and 5b described above may be configured to output and/or input audio signals.
  • the audio component includes a microphone (MIC).
  • the microphone is configured to receive external audio signals.
  • the received audio signal can be further stored in a memory or sent via a communication component.
  • the audio component further includes a speaker for outputting audio signals.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Abstract

本申请实施例提供一种信息处理方法、设备、系统及存储介质。在本申请实施例中,网络交换设备具有可编程的数据平面,利用数据平面的可编程性,使能数据平面准确、及时地选取事件报文,并基于事件报文精准、快速地向数据处理端上报事件信息,数据处理端保存事件信息,以事件信息为基础面向网络管理员提供查询操作,为网络管理员准确、快速地定位网络问题提供了基础,可解决网络问题定位准确度差、速度慢等问题。

Description

信息处理方法、设备、系统及存储介质
交叉引用
本申请引用于2020年2月07日提交的专利名称为“信息处理方法、设备、系统及存储介质”的第2020100823095号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请涉及互联网技术领域,尤其涉及一种信息处理方法、设备、系统及存储介质。
背景技术
数据中心的网络规模越来越庞大,单个集群可能包含上千台交换机、上万台服务器、上十万台光电线路。在这样庞大的网络中,由于各种软件、硬件的配置的问题或故障,网络应用会时常遇到各种性能问题,例如连接中断、带宽下降、延时上升等等。这些性能问题会引发严重的服务质量下降,并对网络运营商造成损失。
现有技术中常用的应用性能异常处理方式,可称之为“在线修复、离线诊断”。首先,网络管理员定位出发生故障的设备或链路。然后,由于数据中心网络具有较好的冗余性,网络管理员可以在不影响网络应用正常运行的情况下,安全地隔离故障设备或链路。最后,网络管理员在不影响网络应用正常运行的情况下,线下诊断故障原因。
在实际应用中,网络管理员定位发生故障的设备或链路的方式,通常是:结合从多个来源收集到的粗粒度信息,并根据经验猜测网络是否存在问题,如果存在,问题可能在哪里。这种猜测可能存在错误,其验证也耗费大量时 间,并拖慢定位进度,导致故障设备或链路的定位时间往往达到分钟级甚至小时级。
发明内容
本申请的多个方面提供一种信息处理方法、设备、系统及存储介质,用以解决网络问题定位准确度差、速度慢等问题。
本申请实施例提供一种网络交换设备,包括:可编程的数据平面;可编程的数据平面被编程,以用于:从经过网络交换设备的数据流中,选取发生设定事件的事件报文;基于事件报文向数据处理端提供事件信息,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题。
本申请实施例还提供一种信息处理方法,适用于网络交换设备,网络交换设备具有可编程的数据平面,该方法由被编程后的数据平面实现,该方法包括:从经过网络交换设备的数据流中,选取发生设定事件的事件报文;基于事件报文向数据处理端提供事件信息,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题。
本申请实施例还提供一种信息处理方法,适用于数据处理端,该方法包括:接收网络交换设备发送的事件信息,事件信息用于描述经过网络交换设备的数据流发生设定事件的相关信息;保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
本申请实施例还提供一种信息处理方法,适用于数据处理端,该方法包括:接收网络交换设备发送的事件报文及其对应的事件元数据,事件报文是经过网络交换设备的数据流中发生设定事件的报文;从事件报文以及对应的事件元数据中提取事件信息,事件信息用于描述发生设定事件的相关信息;保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
本申请实施例还提供一种数据处理设备,包括:存储器、处理器以及通 信组件;存储器,用于存储计算机程序;处理器,与存储器耦合,用于执行计算机程序,以用于:通过通信组件接收网络交换设备发送的事件信息,事件信息用于描述经过网络交换设备的数据流发生设定事件的相关信息;保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
本申请实施例还提供一种数据处理设备,包括:存储器、处理器以及通信组件;存储器,用于存储计算机程序;处理器,与存储器耦合,用于执行计算机程序,以用于:通过通信组件接收网络交换设备发送的事件报文及其对应的事件元数据,事件报文是经过网络交换设备的数据流中发生设定事件的报文;从事件报文及其对应的事件元数据中提取事件信息,事件信息用于描述发生设定事件的相关信息;保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当计算机程序被处理器执行时,致使处理器实现本申请实施例提供的可由数据处理设备执行的信息处理方法中的步骤。
本申请实施例还提供一种配置方法,适用于网络交换设备,网络交换设备包括可编程的数据平面,该方法包括:响应于配置操作,获取可编程的数据平面所需的配置文件;将配置文件配置至可编程的数据平面中,以完成配置操作;其中,可编程的数据平面被配置为:从经过网络交换设备的数据流中,选取发生设定事件的事件报文;基于事件报文向数据处理端提供事件信息,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题。
本申请实施例还提供一种数据中心系统,包括:多台服务器、多台网络交换设备以及数据处理设备;多台服务器与数据处理设备分别与多台网络交换设备通信连接;多台网络交换设备中至少部分网络交换设备包括可编程的数据平面,且可编程的数据平面被编程,可用于:从经过可编程的数据平面所属的网络交换设备的数据流中,选取发生设定事件的事件报文;基于事件 报文向数据处理端提供事件信息,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题;数据处理设备,用于获取可编程的数据平面提供的事件信息,保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
在本申请实施例中,网络交换设备具有可编程的数据平面,利用数据平面的可编程性,使能数据平面准确、及时地选取事件报文,并基于事件报文精准、快速地向数据处理端上报事件信息,数据处理端保存事件信息,以事件信息为基础面向网络管理员提供查询操作,为网络管理员准确、快速地定位网络问题提供了基础,可解决网络问题定位准确度差、速度慢等问题。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1a为本申请示例性实施例提供的一种数据中心系统的结构示意图;
图1b为本申请示例性实施例提供的另一种数据中心系统的结构示意图;
图1c为本申请示例性实施例提供的检测链路丢包事件的原理的示意图;
图1d为本申请示例性实施例提供的可编程数据平面的一种工作原理示意图;
图1e为本申请示例性实施例提供的事件栈结构以及向事件栈存取事件信息的示意图;
图1f为本申请示例性实施例提供的可编程数据平面的另一种工作原理示意图;
图2a为本申请示例性实施例提供的一种网络交换设备的结构示意图;
图2b为本申请示例性实施例提供的另一种网络交换设备的结构示意图;
图3a为本申请示例性实施例提供的一种配置方法的流程示意图;
图3b为本申请示例性实施例提供的一种信息处理方法的流程示意图;
图4a为本申请示例性实施例提供的另一种信息处理方法的流程示意图;
图4b为本申请示例性实施例提供的又一种信息处理方法的流程示意图;
图5a为本申请示例性实施例提供的一种数据处理设备的结构示意图;
图5b为本申请示例性实施例提供的另一种数据处理设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供一种网络系统,该网络系统包括:多台网络设备、多台网络交换设备以及数据处理设备。多台网络设备与多台网络交换设备通信连接,多台网络交换设备与数据处理设备通信连接。当然,多台网络设备之间也可以直接或间接通信连接,多台网络交换设备之间也可以直接或间接通信连接。这些设备之间的通信连接方式可以是有线连接,也可以是无线连接。
在本实施例中,并不限定网络设备的实现形式,可以是任何能够接入网络系统的计算机设备,例如可以是智能手机、平板电脑、个人电脑、笔记本电脑、IoT设备等终端设备,也可以是传统服务器、云服务器、服务器阵列、机柜、大型机等服务器设备。同理,在本实施例中,也不限定网络交换设备的实现形式,可以是任何具有设备互联和数据交换、转发等功能的设备,例如可以是交换机、路由器或集线器等。同理,在本实施例中,也不限定数据处理设备的实现形式,可以是任何具有通信和数据处理能力的设备,例如可以是智能手机、平板电脑、个人电脑或笔记本电脑等终端设备,也可以是传统服务器、云服务器、服务器阵列、机柜、大型机等服务器设备。
可选地,可以将网络系统中的一台或多台网络设备作为本实施例中的数据处理设备;当然,也可以在网络系统中单独部署数据处理设备,对此不做限定。
在本实施例中,网络交换设备具有控制平面和数据平面。其中,至少部分网络交换设备的数据平面可被编程,即网络系统中至少部分网络交换设备具有可编程的数据平面。利用数据平面的可编程性,可使能数据平面准确、及时地选取事件报文,并基于事件报文精准、快速地向数据处理设备上报事件信息;相应地,数据处理设备可保存事件信息,以事件信息为基础面向网络管理员提供查询操作,为网络管理员准确、快速地定位网络问题提供了基础,可解决网络问题定位准确度差、速度慢等问题。
在本实施例中,并不限定网络系统的实现形态。例如,网络系统可以实现为城域网、局域网、企业网或校园网等,也可以实现为数据中心、集群或机房等,或者还可以实现为公有云、私有云、边缘云或混合云等云网络。在图1a中以数据中心为例对网络系统进行图示,图1a所示网络系统可被称为数据中心系统。
如图1a所示,该数据中心系统包括:多台服务器11、多台网络交换设备12以及数据处理设备13。其中,服务器11主要负责执行各种计算任务,可认为是端侧设备,服务器11仅是端侧设备的一种示例,并不限于此;网络交换设备12主要作用是实现服务器11之间的互联,可认为是网络侧设备。多台服务器11之间通过多台网络交换设备12进行互联,服务器11之间的网络数据(例如各种报文)可经过网络交换设备12进行转发。
如图1a所示,一台服务器11可以直接与一台、两台或两台以上的网络交换设备12通信连接,也可以直接与其他服务器11通信连接,并将其他服务器11作为中继,间接与一台、两台或两台以上的网络交换设备12通信连接。这里的通信连接可以是有线连接,也可以是无线连接。
需要说明的是,在数据中心系统中,除了包括服务器11、网络交换设备12以及数据处理设备13之外,还包括一些光电线路,用于实现服务器11、网 络交换设备12以及数据处理设备13之间的互联。在本实施例中,并不限定服务器11和网络交换设备12的数量,可由数据中心系统的规模决定。例如,在一些规模较大的数据中心系统中,单个集群可能包含上千台网络交换设备,上万台服务器,以及上十万台光电线路。
在本实施例中,并不限定网络交换设备12的实现形态,例如可以包括路由器、交换机或集线器等。例如,在图1a所示的数据中心系统中,网络交换设备12包括交换机和路由器,但并不限于此。
无论是哪种实现形态的网络交换设备,在本实施例中,如图1a所示,每台网络交换设备12具备控制平面和数据平面,控制平面与数据平面分离。控制平面相当于网络交换设备12的大脑,运行于一定硬件结构(例如处理器、芯片或板卡等)上,实现网络交换设备12的控制逻辑。数据平面主要实现网络交换设备12的数据交换功能,也运行于一定硬件结构(例如芯片、板卡或线卡等)上。其中,控制平面具有可编程性,这点与现有技术相同或类似,在此不再赘述。
在本实施例中,在多台网络交换设备12中,至少部分网络交换设备12的数据平面具有可编程性,即在多台网络交换设备12中,至少部分网络交换设备12具有可编程的数据平面。其中,至少部分网络交换设备12具有可编程的数据平面包括两种情况:
情况1:在多台网络交换设备12中,全部网络交换设备12均具有可编程的数据平面。
情况2:在多台网络交换设备12中,部分网络交换设备12具备可编程的数据平面,部分网络交换设备12具有不可编程的数据平面。其中,不可编程的数据平面是指数据平面所能实现的功能是固化好的,网络用户无法改变。可编程的数据平面是指数据平面所能实现的功能是可编程的,网络用户可以根据自己的应用需求自定义数据平面的功能,实现与协议无关的网络数据处理流程。
在图1a所示数据中心系统中,以情况2为例,即部分网络交换设备12具有 可编程的数据平面,部分网络交换设备12具有不可编程的数据平面为例进行图示。其中,全部网络交换设备12均具备可编程的数据平面的数据中心系统如图1b所示。
特别说明:对于不具有可编程数据平面的网络交换设备,若其同样具备与本申请实施例中可编程数据平面相同或类似的能力,则同样适用于本申请实施例。在本申请实施例中,重点针对具备可编程数据平面的网络交换设备12展开描述。
在本实施例中,对于具备可编程数据平面的网络交换设备12,利用其数据平面的可编程性,对数据平面进行编程致使其数据平面至少实现以下功能:从经过其所属网络交换设备12的数据流中,选取发生设定事件的事件报文;基于事件报文向数据处理设备13提供事件信息。
其中,经过一台网络交换设备12的数据流是指在一次通信过程中由一个服务器11经网络交换设备12依次发往另一服务器11的各种报文的集合。经过一台网络交换设备12的数据流可能是一条,也可能是多条。无论是一条数据流还是多条数据流,可编程的数据平面能够识别出每条数据流中发生的设定事件,并可选取发生设定事件的事件报文。其中,事件报文是数据流中发生设定事件的报文,或者是数据流中遇到设定事件的报文。
其中,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题(例如故障位置或设备)。在本实施例中,并不对事件信息的内容进行限定,凡是能够描述发生设定事件的相关信息均适用于本申请实施例。例如,事件信息可以包括以下至少一种:设定事件的类型(反映发生的是哪种事件类型)、发生设定事件的详情信息(反映发生设定事件的详细信息)、以及发生设定事件的数据流信息(反映哪条数据流发生了设定事件)。其中,发生设定事件的数据流信息可以是任何能够反映发生设定事件数据流的信息,例如可以是报文的五元组或二元组等信息。设定事件的详情信息包括但不限于:发生设定事件的原因、发生设定事件的位置(例如端口、队列等)、设定事件发生后引起的结果、发生设定事件的时间等。根据事件 类型的不同,发生设定事件的详情信息也会有所不同,可参见下面示例。
在本实施例中,并不对设定事件进行限定,可以是任何与网络故障有关的事件,具体可根据监控需求、系统特性、系统中的应用特点等因素灵活设定。例如,本申请实施例中的设定事件可以包括但不限于:拥塞事件、暂停事件、丢包事件以及换路事件等中的至少一种。其中,设定事件不同,可编程数据平面从数据流中选取发生设定事件的事件报文的方式也会有所不同,相应地,发生设定事件的详情信息以及设定事件对应的事件信息也会有所不同。下面将结合几种设定事件的定义,对选择事件报文的方式以及对应的事件信息进行示例性说明。
拥塞(congestion)事件:拥塞在数据中心系统及其它网络中都较为常见。例如,假设数据中心系统中的服务器11上部署有MapReduce(映射/归约)等应用,这些应用产生了一类名为incast的流量特征,即多个发送方服务器同时向同一个接收方服务器发送数据。在这种情况下,负责向接收方服务器转发数据的网络交换设备12的端口会经历队列堆积,在其中排队的报文会经历排队延时,形成拥塞。另外,如果数据中心系统采用不公平的负载均衡策略,也可能造成拥塞。
网络交换设备12具有多个入端口和多个出端口,一个报文从一个入端口进入网络交换设备12,在网络交换设备12的内部报文会被交换到一个出端口,从该出端口将报文送出网络交换设备12。对于拥塞事件,可编程的数据平面可以针对网络交换设备12的各出端口,判断各出端口对应的队列中报文的排队延时是否超出设定时延阈值,或判断各出端口对应的队列的长度是否超出设定的长度阈值。若判断结果为是,则确定该出端口上发生了拥塞事件,相应地,该出端口对应的队列中的报文即为发生了拥塞事件的事件报文。
对于拥塞事件,其详情信息可以包括但不限于:发生拥塞的网络交换设备的信息(如IP地址)发生拥塞的端口信息(如端口号),发生拥塞的队列信息(如队列编号)),发生拥塞的大概时间,以及报文的排队延时或队列长度等;相应地,对应的事件信息可以包括但不限于如下信息:发生拥塞的网 络交换设备的信息(如IP地址)/端口信息(如端口号)/队列信息(如队列编号),遇到拥塞的数据流信息(如五元组或其余可用来标识一条数据流的特征),发生拥塞的大概时间,以及报文的排队延时或队列长度等。
暂停(Pause)事件:若数据中心系统是无损网络,则在无损网络中,对任一网络交换设备来说,若其出端口对应的队列的长度超出设定长度阈值,则该网络交换设备可向其上游设备(即向该网络交换设备的该出端口发送报文的设备)发送PFC(英文为:Priority Flow Control)流控报文,使其上游设备暂停发送报文,直至该出端口对应的队列逐渐排空。然而,在某些场景下,如PFC死锁、队列堵死等情况下,队列不会被排空,上游设备中的报文将会持续被暂停。一旦恶化,可能会形成全网范围内的PFC风暴及流量停发。在本申请实施例中,可以利用数据平面的可编程性,使能数据平面自主识别被暂停的事件报文并向数据处理设备13提供暂停事件对应的事件信息,以便及时、准确地发现PFC风暴、流量停发等问题。
对于暂停事件,基于可编程性,数据平面可记录网络交换设备中各出端口或队列是否处于暂停发送状态。基于此,对于接收到的报文,若该报文需要被路由到网络交换设备的某个出端口(简称为目标出端口),则可编程数据平面可以检测该目标出端口或其对应的队列是否处于暂停发送状态;若是,则确定发生了暂停事件,该报文是经历了暂停事件的事件报文。
对于暂停事件,其详情信息可以包括但不限于:发生暂停的网络交换设备的信息(如IP地址)发生暂停的端口信息(如端口号),发生暂停的队列信息(如队列编号)),以及发生暂停的大概时间等;相应地,对应的事件信息可以包括但不限于如下信息:发生暂停事件的网络交换设备的信息(如IP地址)/端口信息(如端口号)/队列信息(如队列编号),遇到暂停事件的数据流信息(如五元组),遇到暂停事件的时间。
丢包(Drop)事件:在数据平面中,报文可能因为多种原因被丢弃,如拥塞丢包、流水线丢包、链路静默丢包等。丢包可能造成应用性能急剧下降,造成损失。数据平面的丢包事件可以粗略分为设备丢包和链路丢包。对于设 备丢包事件,数据平面可以判断其流水线是否对报文作出了丢弃决定,或出端口对应的队列是否由于拥塞丢弃报文。若是,则确定发生过了设备丢包事件(例如流水线丢包事件或拥塞丢包事件),被丢弃的报文即为发生设备丢包事件的事件报文。
相比之下,链路丢包事件往往难以检测,其原因如下:由于链路故障、折断、污损、接头松动等问题,报文在链路上可能会经历比特翻转事件,即部分比特传输发生错误,从而导致到达链路对端的报文无法通过报文格式校验,会被直接丢弃;另外,由于接收到的报文是错误的,所以无法识别出是哪个数据流经历了链路丢包事件。在本申请实施例中,链路丢包事件中的“链路”包含在上下游两个网络交换设备的数据平面之间所有经过的模块以及连接线路。
对于一个工作正常的链路,上游网络交换设备发送的正确报文个数与下游网络交换设备收到的正确报文个数理论上是相等,且应该是连贯的。基于此,在本实施例中,基于可编程数据平面,提出一种基于报文编号的链路丢包检测方法。如图1c所示,以上游的网络交换设备A向下游的网络交换设备B发送报文为例,该方法包括以下操作:
步骤1:网络交换设备A在将待发送的报文发送出去之前,其可编程数据平面先对待发送的报文进行编号,并在本地缓存每个待发送的报文的编号及其所属的数据流信息。
如图1c所示,网络交换设备A在本地维护一个环形缓存区(Ring buffer),用来缓存待发送报文的编号及其所属的数据流信息,并通过一计数器记录已经缓存的报文个数。由于环形缓冲区的大小有限,在环形缓存区的空间用光时,将从头开始顺次替换。需要说明的是,利用环形缓存区来缓存报文编号及数据流信息仅为示例性实施方式,并不限于这一种缓存方式。例如,也可以采用非环形缓存区来缓存报文编号及数据流信息。
步骤2:网络交换设备A将携带了编号的报文发送至对端的网络交换设备B。
步骤3:网络交换设备B进行丢包检测,即检查收到的正确报文的编号是否连续;若不连续,认为发生了链路丢包事件。
在图1c中,网络交换设备A依次向网络交换设备B发送了编号为10-16的报文,但是网络交换设备B接收到了编号为10-12以及14-16的报文,未接收到编号为13的报文。网络交换设备B在接收到编号为14的报文时,可确定丢失了编号为13的报文,确定发生了链路丢包事件。
步骤4:网络交换设备B向网络交换设备A发送丢包通知消息,该丢包通知消息中携带缺失报文的编号。
步骤5:网络交换设备A接收到丢包通知消息后,查找本地缓存的报文编号以确定发生链路丢包事件的事件报文及其数据流信息。
进一步,如图1c所示,网络交换设备A还可以根据丢包通知消息中携带的缺失报文的编号,结合环形缓存区中记录的信息,确定发生链路丢包的事件报文及其对应的数据流信息。
对于丢包事件,其详情信息可以包括但不限于:发生丢包的网络交换设备的信息(如IP地址)发生丢包的端口信息(如端口号),发生丢包的队列信息(如队列编号)),发生丢包的大概时间,以及发生丢包的原因等;相应地,对应的事件信息可以包括但不限于如下信息:发生丢包事件的网络交换设备的信息(如IP地址)/端口信息(如端口号)/队列信息(如队列编号),发生丢包的链路信息,发生丢包的数据流信息,发生丢包事件的时间,丢包的原因等。
换路事件:为了保证数据中心系统具有高可靠性,数据中心系统通常具有较高冗余性,在两台服务器11之间可能存在多条等价路径。为充分利用冗余链路的带宽,数据中心系统中运行着如等价多路径(Equal Cost Multiple Path,ECMP)等负载均衡算法,将流量分发到多条路径上。然而,在遇到链路故障、网络交换设备故障、协议故障等情况时,一条或多条链路无法正常使用。负载均衡算法会将数据流重新分配到新的路径上。或者,数据中心系统中运行的路由交换协议(例如BGP、OSPF、IS-IS)的正常收敛,或因为遇 到链路故障、网络交换设备故障等异常情况造成的收敛,也会将数据流重新分配到新的路径上。这些情况称之为换路,及时、准确地捕捉换路事件有助于帮助快速诊断网络故障。
对于换路事件,可编程数据平面可针对每个待发送的报文,检测待发送的报文所属的数据流信息是否是第一次出现,即可编程数据平面会学习经过本网络交换设备的数据流是否是一条新流。若是新流,则确定该数据流是从其他路径换过来的,认为发生了换路事件。当然,对于第一次出现在数据中心系统中的数据流,在本实施例中,也会将其归为发生换路事件的范畴内。
对于换路事件,其详情信息可以包括但不限于:发生换路的网络交换设备的信息(如IP地址)发生换路的端口信息(如端口号),发生换路的队列信息(如队列编号)),发生换路的大概时间以及换路后的新路径(相当于发生换路事件的结果)等;相应地,对应的事件信息可以包括但不限于如下信息:发生换路事件的网络交换设备的信息(如IP地址)/端口信息(如端口号)/队列信息(如队列编号),发生换路事件的数据流信息,出现换路事件的时间信息以及换路后的新路径等。
在一可选实施例中,在从数据流中选取事件报文的过程中,还可以生成事件报文对应的事件元数据。事件元数据是对设定事件进行描述的数据,包括但不限于:设定事件的类型以及发生设定事件的详情信息。在选取事件报文的过程中,可以识别所发生的事件类型以及发生事件的设备/端口/队列、时间、原因、结果等信息,可以将这些信息作为事件报文对应的事件元数据。
无论采用何种方式,在从经过网络交换设备的数据流中选取到发生设定事件的事件报文并生成事件报文对应的事件元数据之后,可编程的数据平面可基于事件报文及其对应的事件元数据,向数据处理设备13提供事件信息。其中,可编程的数据平面向数据处理设备提供事件信息的实现方式包括但不限于以下两种:
方式1:可编程的数据平面从事件报文及其对应的事件元数据中提取事件信息,向数据处理设备提供事件信息。在方式1中,是直接向数据处理设备13 提供事件信息。
方式2:可编程的数据平面向数据处理设备发送事件报文及其对应的事件元数据,以供数据处理设备从事件报文及其对应的事件元数据中提取事件信息。在方式2中,是间接向数据处理设备13提供事件信息。
下面结合图1d-图1e对可编程数据平面采用方式1的详细实施方式进行示例性说明,并结合图1f对可编程数据平面采用方式2的详细实施方式进行示例性说明。
如图1d所示,为方式1下可编程数据平面的一种整体工作原理,包括以下操作:
(1-1)事件报文选取:可编程数据平面从经过其所属网络交换设备的数据流中,选取发生设定事件的事件报文并生成事件报文对应的事件元数据。
实际应用中,对任一数据流来说,其中遇到事件的报文仅占很小一部分,事件报文的选取能够极大降低需要监控的网络流量,与复制全量报文相比,开销可以降低一至两个数量级。
在图1d中,假设可编程数据平面从经过其所属网络交换设备的数据流中,选取到发生事件E1的事件报文为6个,发生事件E2的事件报文为5个,发生事件E3的事件报文为4个,发生事件E4的事件报文为5个,以及发生事件E5的事件报文为4个。事件E1-E5代表不同事件。图1d中,不同样式的长形框(或方形框)表示不同数据流下的事件报文及其对应的事件元数据。其中,数据流s1和s2发生了事件E1和E4;数据流s3发生了事件E1、E4和E5;数据流s4发生了事件E2;数据流s5发生了事件E2和E4;数据流s6、s7发生了事件E3和E5。
对于不同设定事件,选取发生设定事件的事件报文的方式也会有所不同。其中,针对拥塞事件、暂停事件、丢包事件以及换路事件选取事件报文方式可参见前述实施例,在此不再赘述。
(1-2)事件报文去冗余:可编程数据平面对事件报文进行去冗余处理,得到去冗余后的目标事件报文及其对应的事件元数据。
对任一事件来说,在选取到的事件报文中,可能包含同一数据流下的多 个事件报文。如图1d所示,对事件E1,选取到数据流s1下2个事件报文,选取到数据流s3下3个事件报文;对事件E2,选取到数据流s4下2个事件报文,选取到数据流s5下3个事件报文;对事件E3,选取到数据流s6下2个事件报文,选取到事件流s7下2个事件报文;对事件E4,选取到数据流s5下2个事件报文;对事件E5,选取到数据流s7下2个事件报文。
然而,事件的上报仅需包含事件的详情信息及数据流信息即可,与事件报文的个数没有必然关系。基于此,在本操作中对事件报文进行去除冗余处理,优选地,可以以同一条数据流保留一个事件报文为目标,对事件报文进行去冗余处理。这种方式可以在保证事件覆盖率的情况下,进一步降低事件上报流量,节约流量的传输、处理和存储开销。在本实施例中,并不限定对事件报文进行去冗余采用所采用的方法,下面举例说明。
可选地,可以采用基于哈希的去重方法,即每种事件,对发生该事件的各事件报文或事件报文的报文头或报文头中的流信息进行哈希处理,得到哈希值,将哈希值相同的事件报文去除。例如,可以采用Bloom Filter技术实现。
可选地,还可以采用基于精确匹配的去重方法,即对每种事件,准确学习和记录发生该事件的事件报文所属的数据流信息,将属于这些数据流信息的后续事件报文丢弃,达到去冗余的目的。
可选地,还可以将对事件报文去冗余的约束条件设置为:假阴性为0,即所有经历时间的数据流至少有一个事件报文,同时最小化假阳性,即尽量去除属于同一数据流的报文冗余。针对该约束条件,本申请实施例提供了一种新型的分级分组投票去重方法。在该方法中,可编程数据平面维护一张信息表,为便于区分,将该信息表称为第一信息表;可选地,第一信息表可以是哈希表、精确匹配表、链表、散列等多种数据结构实现。第一信息表中的每个表项用于记录一条数据流信息及其对应的事件报文数量,其中,事件报文数量可以通过计数器记录。
基于上述,针对接收到的每个事件报文,计算该事件报文所属的数据流信息的哈希值,并将该哈希值作为索引,在第一信息表中进行匹配。若未匹 配到与该哈希值对应的目标表项,将该事件报文作为目标事件报文,并将该事件报文所属的数据流信息记录到一个空表项中,开始对该数据流信息对应的事件报文数量进行计数。
进一步,若匹配到与该哈希值对应的目标表项,则将目标表项中记录的数据流信息与该事件报文所属的数据流信息进行比较。若目标表项中记录的数据流信息与该事件报文所属的数据流信息相同,则将目标表项对应的事件报文数量加1,并将该事件报文丢弃。若目标表项中记录的数据流信息与该事件报文所属的数据流信息不相同,则将目标表项对应的事件报文数量减1;并判断减1后的事件报文数量是否为0。若减1后的事件报文数量为0,则将该事件报文作为目标事件报文,并将目标表项中记录的数据流信息替换为该事件报文所属的数据流信息,并重新对事件报文数量进行计数。若减1后的事件报文数量不为0,则将该事件报文丢弃。
采用上述方法,可以提高大数据流留在第一信息项中的概率,确保无假阴性,并尽可能降低假阳性。
需要注意的是,在可编程数据平面资源足够的情况下,上述方法还可以采用多级串联的方式。可选地,不同级别可以采用不同的哈希算法。相同的数据流信息在不同级别会被哈希到不同表项中,从而进一步减少假阳性。
需要注意的是,在可编程数据平面资源受限的情况下,上述方法可以拆分至数据平面中多级流水线中实现。例如,可以在第一级流水线中,确认数据流信息中的IP地址部分是否相同;在第二级流水线中,确认数据流信息中的端口部分是否相同;等等。上述方案还可以进一步拆分为更细的粒度,例如在第一级流水线仅确认数据流信息中的源IP地址是否相同,在第二级流水线中仅确认数据流信息中的目的IP地址是否相同,在第三级流水线中确认数据流信息中的端口部分是否相同,等。其中,具体拆分方式取决于网络交换设备中各级流水线的资源情况。
(1-3)事件信息提取:可编程数据平面从目标事件报文及其对应的事件元数据中提取事件信息。
事件报文及其对应的事件元数据中均包含有与事件相关的信息,故可以从事件报文及其对应的事件元数据中提取事件信息,即一部分事件信息来自于事件元数据,一部分事件信息来自于事件报文。一个事件报文中包括很多信息,例如报文头和报文载荷,这些信息中有一些与事件无关的信息,例如报文头中除去可以标识数据流的信息之外其它信息以及报文载荷,属于无用信息。在本操作中,从事件报文中提取与事件相关的部分事件信息,所提取的与事件相关信息的大小(例如为20字节)远小于事件报文,这可进一步降低事件上报流量。对事件元数据中的信息,可以全部作为事件信息上报,也可以选择其中部分上报,对此不做限定。如图1d所示,经过事件信息提取操作后,可得到事件E1-E5对应的事件信息,这些事件信息的大小明显小于事件E1-E5下的事件报文及其对应事件元数据的大小。
在一可选实施例中,可编程数据平面可以维护事件栈(Event Stack),用于暂存事件信息。如图1e所示,事件栈包含一个栈顶计数器(Stack Top Counter)和事件存储两部分。栈顶计数器用于记录事件栈中暂存的事件信息的个数。事件存储用于存储事件信息。可选地,事件存储可以包括一个或多个栈块。基于事件栈,在提取到事件信息后,可以将事件信息存储至事件栈中至少一个栈块中。需要注意的是,对于不同的可编程数据平面,其存储位宽可能有所不同,所以不同可编程数据平面所维护的事件栈中栈块的最大存储能力会有所不同。但是,对同一可编程数据平面来说,各栈块的最大存储能力(即最大位宽)一般是相同的。
可选地,若事件信息的大小小于或等于栈块的最大位宽,可以将事件信息完整地存储至一个栈块中。若事件信息的大小大于栈块的最大位宽,则可以将事件信息拆分为多个信息块,将多个信息块存储至多个栈块中,每个信息块的大小小于或等于栈块的最大位宽。如图1e所示,假设栈块的最大位宽是64比特,事件信息的大小为20字节,则可以将事件信息拆分为3个信息块,3个信息块的大小分别为64比特(即8字节)、64比特和32比特(即4字节),然后将3个信息块存储至图1e所示的3个栈块内,第三个栈块仅占用了32比特, 还有32比特的剩余空间。需要说明的是,除了上述拆分方式之外,也可以将20字节的事件信息拆分为5个32比特的信息块,将5个信息块存储至5个栈块内。
在从事件报文及其对应的事件元数据中提取事件信息之后,可以对事件报文进行相应处理。例如,对于经历拥塞事件或换路事件的事件报文,可被转发出网络交换设备,而对于经历丢包事件或暂停事件的事件报文,在提取事件信息后可被丢弃。对于暂停事件,可以将发生暂停事件的报文复制一份作为事件报文,由于是事件报文是复制报文,故将其丢弃并不会影响原始报文的后续处理。
(1-4)事件信息批处理:将指定数量个事件信息拼接成一个数据包,将数据包发送给网络交换设备的控制平面或数据处理设备。
如图1d所示,事件信息提取操作去除了事件报文中的无用信息,每个数据流的事件信息比较小,有利于减小存储开销。然而,若将每个事件信息放入一个数据包中上送,会产生大量的小包,这会降低事件信息接收方(即控制平面或数据处理设备)的吞吐量,这会降低事件信息接收方处理事件信息的效率。鉴于此,在本操作中,采用批处理技术,将指定数量个事件信息合并放在一个数据包中上送,可减少数据传输量,有利于提高事件信息接收方(即控制平面或数据处理设备)的吞吐量。
在一可选实施例中,将数据包作为一个载体,触发栈块pop(出栈)栈顶元素的操作,提取栈顶的事件信息;将栈顶的事件信息与已携带的事件信息拼接。若此时,数据包携带的事件信息个数达到指定数量,则将数据包发送出去;然后,复制该数据包,并清空其内容开始下一轮事件信息的收集和拼接。若此时,数据包携带的事件信息个数未达到指定数量,则将数据包循环送回事件栈中,并继续收集栈顶的事件信息,直至数据包携带的事件信息个数达到指定数量为止。在图1e中,实线所示为向栈块内压入(push)事件信息的过程,虚线所示为数据包从栈块内pop栈顶事件信息进行拼接的过程。
(1-5)事件信息去冗余:控制平面或数据处理设备对数据包中的事件信 息进行去冗余处理。
考虑到可编程数据平面的资源、可编程性等限制,可能无法对事件报文做到完全去冗余。如图1d所示,经过去冗余后,事件E1下依旧存在数据流s3的2个事件报文。鉴于此,在经合并后的事件信息可被上送至网络交换设备的控制平面或数据处理设备,之后,可以充分利用控制平面或数据处理设备中处理器(CPU)的处理能力和存储资源,对事件信息进一步去冗余,争取做到一个数据流发生任一事件的事件信息仅出现一次,消除假阳性。在图1d中,以事件信息去冗余操作由数据处理设备实施为例进行图示。
在本实施例中,并不限定对事件信息进行去冗余处理的方式。可选地,在网络交换设备的控制平面(即CPU)或数据处理设备的CPU可以维护第二信息表,第二信息表记录有已经发送给数据处理端的事件信息。第二信息表可以是哈希表、精确匹配表、链表、散列等多种数据结构实现。基于此,在接收到携带有指定数量个事件信息的数据包后,可以从数据包中解析出指定数量个事件信息;对解析出的每个事件信息,检查第二信息表中是否已有相应记录;若是,说明该事件信息是冗余的,可以将该事件信息丢弃;若否,则保留该事件信息,将未被丢弃的事件信息记录到第二信息表中。可选地,若事件信息去冗余操作由网络交换设备的控制平面执行,则可以还可以将未被丢弃的事件信息重新封装为新的数据包,并发送给数据处理设备。对于第二信息表是哈希表的情况下,可以将事件信息进行哈希,将哈希值与哈希表中的哈希值进行比较;若该哈希值已经存在哈希表中,则说明该事件信息是冗余的,可以将其丢弃;反之,说明该事件信息需要被保留。
此外,若事件信息去冗余操作由网络交换设备的控制平面(即CPU)实施,则网络交换设备的控制平面(即CPU)在对事件信息进行去冗余后,还需要将去冗余后的事件信息重新封装成新的数据包,将数据包发送给数据处理设备。可选地,在该过程中,还可以包括:(1-6)流量整形操作,即网络交换设备的控制平面(即CPU)可以对要上报的数据包进行流量整形,以防止突发事件信息产生大量的上送流量冲击网络及数据处理设备。一种流量整 形方式为:网络交换设备的控制平面(即CPU)先将需要上送的数据包缓存在CPU本地,然后,以相对稳定的速率向数据处理设备发送。当然,其他流量整形方式也适用于网络交换设备的控制平面(即CPU)。
在一可选实施例中,网络交换设备的控制平面(即CPU)与数据处理设备之间通过TCP等可靠传输层协议建立可靠连接。基于此,网络交换设备的控制平面(即CPU)可以通过TCP等可靠连接,将整形后的数据包发送给数据处理设备。其中,可靠传输层协议可以实现丢包重传功能,可以保证事件信息的完整性,有利于保证网络管理员基于事件信息定位网络问题的准确性。
在另一可选实施例中,网络交换设备的控制平面(即CPU)与数据处理设备之间通过UDP等不可靠传输层协议建立不可靠连接。基于此,网络交换设备的控制平面(即CPU)可以通过UDP等不可靠连接,将整形后的数据包发送给数据处理设备。这种方式的优势是对网络交换设备的开销较低,但缺点是可能出现丢包,无法保证事件信息的完整性。基于此,数据处理后端在接收到数据包之后,可对接收到的数据包进行完整性校验。
可选地,在采用不可靠连接的情况下,一种对数据完整性的校验方法为:网络交换设备的控制平面(即CPU)为每个传输的事件信息的数据包添加序列号,并在发出数据包之后在本地缓存该数据包一段时间;相应地,数据处理设备接收到数据包之后,可以检测数据包的序列号是否连续;若发现收到的数据包的序列号与之前已收到的数据包的序列号不连续,则可以将缺失数据包的序列号通知给网络交换设备的控制平面(即CPU),网络交换设备的控制平面(即CPU)将重新发送该数据包。在该方式中,通过追踪数据包的序列号的方式,可解决丢包问题,有利于保证事件信息的完整性。
(1-7)事件信息保存:数据处理设备13获取可编程的数据平面提供的事件信息,保存事件信息,例如将事件信息存入数据库中,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
可选地,数据处理设备13可以按照事件类型对事件信息进行分类存储。对每个事件信息包括:事件类型(例如拥塞、暂停、换路或丢包),发生事 件的数据流信息,与事件相关的详情信息(例如发生原因、发生端口/队列,发生时间等)等。根据事件类型的不同,事件信息也会有所不同。下面对不同事件对应的事件信息进行举例说明:
拥塞事件:交换机、出端口、出队列、流标识(如<源IP、目的IP、源端口、目的端口、协议>构成的五元组,或<源IP、目的IP>构成的二元组等)、排队延时、队列长度、时间戳(表示拥塞事件发生时间);
暂停事件:交换机、入端口、出端口、出队列、流标识、时间戳(表示暂停事件发生时间);
丢包事件:丢包位置(如交换机流水线、交换机缓存或链路)、丢包原因、流标识、时间戳(表示丢包事件发生时间);
换路事件:交换机、入端口、出端口、出队列、流标识、时间戳(表示换路事件发生时间)。
基于上述较为丰富的事件信息,数据处理设备13可以向网络管理员提供各种维度的查询操作,例如包括但不限于以下至少一种:数据流维度的查询操作、事件维度的查询操作、设备维度的查询操作以及时间维度的查询操作。其中,数据流维度的查询操作是指以指定数据流为查询对象,查询指定数据流在指定时间发生过哪些事件。事件维度的查询操作是指以指定事件为查询对象,查询指定时间发生指定事件的数据流有哪些。设备维度的查询操作是指以指定设备为查询对象,查询指定设备在指定时间段发生过哪些事件。时间维度的查询操作是指以指定时间为查询对象,查询指定时间内各数据流发生过哪些事件。当然,这些维度也可以任意方式进行聚合,形成聚合查询维度。
在此说明:在图1d所示实施例中,包括:(1-1)事件报文选取、(1-2)事件报文去冗余、(1-3)事件信息提取、(1-4)事件信息批处理、(1-5)事件信息去冗余、(1-6)流量整形操作和(1-7)事件信息保存。其中,(1-2)事件报文去冗余、(1-4)事件信息批处理、(1-5)事件信息去冗余和(1-6)流量整形操作均为可选操作,这些可选操作可以择一使用,也可以以任意方 式组合使用。另外,在采用上述方式1的情况下,除了(1-1)事件报文选取和(1-3)事件信息提取之外,由可编程数据平面实现的其它操作可以根据需求灵活地移到数据处理设备端实现,得到不同的变形方案。下面介绍几种变形方案。
变形方案1:上述操作(1-1)-(1-4)由可编程数据平面实现,操作(1-5)和(1-7)由数据处理设备实现。
变形方案2:上述操作(1-1)-(1-3)由可编程数据平面实现,操作(1-5)和(1-7)由数据处理设备实现。
变形方案3:上述操作(1-1)和(1-3)由可编程数据平面实现,操作(1-5)和(1-7)由数据处理设备实现。
如图1f所示,为方式2下可编程数据平面的一种工作原理,包括以下操作:
(2-1)事件报文选取:可编程数据平面从经过其所属网络交换设备的数据流中,选取发生设定事件的事件报文并生成事件报文对应的事件元数据。
(2-2)事件报文去冗余:可编程数据平面对事件报文进行去冗余处理,得到去冗余后的目标事件报文及其对应的事件元数据,将目标事件报文及其对应的事件元数据发送给数据处理设备。
(2-3)事件信息提取:数据处理设备从目标事件报文及其对应的事件元数据中提取事件信息。
(2-4)事件信息去冗余:数据处理设备对数据包中的事件信息进行去冗余处理。
(2-5)事件信息保存:数据处理设备保存事件信息,例如将事件信息存入数据库中,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
在图1f所示实施例中,操作(2-1)与图1d所示实施例中的操作(1-1)相同,操作(2-1)中对事件报文进行去冗余过程与图1d所示实施例中操作(1-2)中对事件报文进行去冗余过程相同,故在此均不做赘述。在图1f所示实施例中, 操作(2-3)和(2-4)的原理与图1d所示实施例中操作(1-3)和(1-4)的原理相同,区别在于:在图1d所示实施例中这些操作由数据平面(硬件)实施,在图1f所示实施例中这些操作由数据处理端(软件)实施。故详细实施过程在此不再赘述。
在此说明:在图1f所示实施例中,包括:(2-1)事件报文选取、(2-2)事件报文去冗余、(2-3)事件信息提取、(2-4)事件信息去冗余和(2-5)事件信息保存。其中,(2-2)事件报文去冗余和(2-4)事件信息去冗余均为可选操作,这些可选操作可以择一使用,也可以以任意方式组合使用。另外,在采用上述方式2的情况下,除了(2-1)事件报文选取之外,(2-2)事件报文去冗余也可以移到数据处理设备端实现,得到变形方案4。
变形方案4:上述操作(2-1)由可编程数据平面实现,操作(2-2)-(2-5)由数据处理设备实现。
在极端情况下,上述操作(2-1)也可以移到数据处理设备实现,即得到变形方案5,即上述(2-1)-(2-5)均由数据处理设备实现。在变形方案5中,可编程数据平面可以将数据流的全部报文(无论是否经历事件)上报给数据处理设备,由数据处理设备选取发生设定事件的事件报文,进行事件信息提取等操作。
在此说明,在上述各实施例中,由数据处理设备执行的各种操作具体可由数据处理设备的CPU执行。
在数据中心系统中,可能由于各种软件、硬件配置的问题或故障,网络应用会时常遇到各种性能问题,如连接中断、带宽下降、延时上升等。为了诊断故障原因,网络管理员需要快速、准确地定位出发生故障的设备或链路。在本申请上述实施例中,基于网络交换设备的可编程数据平面,可以将网络问题的定位与系统中数据流遇到的事件关联起来,为快速、准确地定位网络问题提供了机会。具体地,通过对网络交换设备的数据平面进行编程,由数据平面独立地从数据流中精准、及时地识别数据流中遇到设定事件的事件报文。基于这种方案,网络管理员能够全面抓取网络故障导致对流量的影响, 同时最小化开销。进一步,基于数据平面的可编程性,该方案能够持续、并发、实时监控数据流中遇到的事件,包括但不限于:丢包、拥塞、路径变化、暂停等事件,甚至包括传统方法难以诊断的链路静默丢包事件。
进一步,本申请实施例方案还具有如下技术效果:
(1)流事件全覆盖。本申请利用数据平面的可编程性,使能数据平面主动上报全量流事件信息,并通过数据完整性校验保证后端能完整收到所有流事件,实现流事件的全覆盖,使网络监控进入前所未有的细粒度时代。
(2)处理开销最低。本申请利用数据平面的可编程性,精准上报流事件信息,去除所有无用或冗余的信息,保证数据传输和处理开销最小化。
(3)提升网络稳定性。基于全量流事件,网络管理员能够以较高(例如100%)的信心证明网络清白,或实现秒级网络故障的定位,进一步提高网络稳定性。
在上述实施例中,主要介绍了可编程数据平面的功能,并未限定可编程数据平面的实现结构,凡是能够实现上述实施例中描述的各种功能的实现结构均适用于本申请实施例的可编程数据平面。例如,本申请实施例的可编程数据平面可以采用流水线结构。当然,也可以采用非流水线结构。进一步,不同厂商的流水线结构在具体实现上也会各有千秋。在本申请下述实施例中给出一种具体的流水线结构。
图2a为本申请示例性实施例提供的一种网络交换设备的结构示意图。如图2a所示,该网络交换设备20包括:控制平面21和可编程的数据平面22。控制平面21与可编程的数据平面22分离,但两者之间可通信。控制平面21相当于网络交换设备的大脑,负责实现网络交换设备的控制逻辑,例如协议报文转发、协议表项计算、维护等都属于控制平面21的范畴。可编程的数据平面22负责网络交换设备的数据交换功能,例如报文的接收、解封装、封装、转发等都属于可编程的数据平面22的范畴。
在本实施例中,数据平面22具有可编程性,基于数据平面22的可编程性, 允许用户根据自己的应用需求自定义数据平面22的功能。在本实施例中,数据平面22被编程,具有以下功能:可从经过网络交换设备20的数据流中,选取发生设定事件的事件报文;基于事件报文向数据处理端提供事件信息,所述事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题。
其中,经过网络交换设备20的数据流可能是一条,也可能是多条,无论是一条数据流还是多条数据流,可编程的数据平面22能够识别出每条数据流中发生的设定事件,并可选取发生设定事件的事件报文。其中,事件报文是数据流中发生设定事件的报文,或者是数据流中遇到设定事件的报文。
其中,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题(例如故障位置或设备)。在本实施例中,并不对事件信息的内容进行限定,凡是能够描述发生设定事件的相关信息均适用于本申请实施例。例如,事件信息可以包括以下至少一种:设定事件的类型、设定事件的详情信息以及发生设定事件的数据流信息(。其中,发生设定事件的数据流信息可以是任何能够反映发生设定事件数据流的信息,例如可以是报文的五元组或二元组等信息。设定事件的详情信息包括但不限于:发生设定事件的原因、发生设定事件的位置(例如端口、队列等)、设定事件发生后引起的结果、发生设定事件的时间等。根据事件类型的不同,发生设定事件的详情信息也会有所不同。
在本实施例中,并不对设定事件进行限定,可以是任何与网络故障有关的事件,具体可根据监控需求、系统特性、系统中的应用特点等因素灵活设定。例如,本申请实施例中的设定事件可以包括但不限于:拥塞事件、暂停事件、丢包事件以及换路事件等中的至少一种。关于拥塞事件、暂停事件、丢包事件以及换路事件的详细说明可参见前述系统实施例中的描述,在此不再赘述。
在本实施例中,可编程的数据平面22为流水线结构。如图2a所示,可编程的数据平面22依次包括:入端流水线(ingress pipeline)221、缓存管理单元 (Memory management unit,MMU)222和出端流水线(egress pipeline)223。
入端流水线221、MMU 222以及出端流水线223依次对经过网络交换设备20的数据流进行报文接收处理、报文交换处理和报文发送处理。即,一条数据流中的报文首先到达入端流水线221,入端流水线221对报文进行接收处理;这里的接收处理包括但不限于:将报文暂存至入端缓存中,对报文进行正确性校验,为报文查找路由表以确定报文对应的目标出端口等。MMU 222主要对网络交换设备20的缓存进行管理,管理网络交换设备20各出端口对应的队列(一个队列占用部分缓存区域),负责将报文从入端缓存中拷贝到目标出端口对应的队列中,等等。出端流水线223主要负责将每个出端口对应队列中的报文发送出去,在发送出去之前还可以对报文进行校验等。
在本实施例中,入端流水线221、MMU 222以及出端流水线223除了具有上述传统报文处理功能之外,还可被编程以实现事件上报功能。具体地,入端流水线221还用于在对经过网络交换设备20的数据流进行报文接收处理的过程中,选取发生设定事件的事件报文,并将选取的事件报文及其对应的事件元数据上报给出端流水线223;MMU 222还用于在对经过网络交换设备20的数据流进行报文交换处理的过程中,选取发生设定事件的事件报文,并将选取的事件报文及其对应的事件元数据上报给出端流水线223;出端流水线223,还用于在对经过网络交换设备20的数据流进行报文发送处理的过程中,选取发生设定事件的事件报文,并根据本端选取的事件报文及其对应的事件元数据以及由入端流水线221和MMU 222上报的事件报文及其对应的事件元数据向数据处理端上报事件信息。
需要说明的是,在不同报文处理过程中可能发生的事件类型会有所不同,因此,在报文接收处理的过程中发生的设定事件,在报文交换处理的过程中发生的设定事件,以及在报文发送处理的过程中发生的设定事件可能会有所不同。例如,在设定事件包括:拥塞事件、丢包事件、暂停事件以及换路事件的情况下,在报文接收处理的过程中可能发生流水线丢包事件和/或暂停事件,在报文交换处理的过程中可能发生缓存丢包事件,在报文发送处理的过 程中可能发生拥塞事件、换路事件、流水线丢包事件和/或链路丢包事件等。其中,缓存丢包事件、链路丢包事件以及流水线丢包事件都属于丢包事件。相应地,入端流水线221需要在报文接收处理的过程中,选取发生丢包事件和/或暂停事件的事件报文,并将发生丢包事件和/或暂停事件的事件报文以及对应的事件元数据上报给出端流水线223;MMU 222需要在报文交换处理的过程中,选取发生缓存丢包事件的事件报文,并将发生缓存丢包事件的事件报文以及对应的事件元数据上报给出端流水线223;出端流水线223需要在报文发送处理的过程中,选取发生拥塞事件、换路事件、流水线丢包事件和/或链路丢包事件的事件报文,进而根据自身选取出的事件报文及其对应的事件元数据和接收到的事件报文及其对应的事件元数据向数据处理端上报事件信息。
在本申请实施例中,并不限定入端流水线221、MMU 222以及出端流水线223的具体实现结构,凡是可以选取相应事件报文的实现结构均适用于本申请实施例。在本申请下述实施例中,针对端流水线221、MMU 222以及出端流水线223分别给出一种示例性的实现结构。
如图2b所示,入端流水线221的一种实现结构包括:入端事件检测模块2211。入端事件检测模块2211主要用于在报文接收处理的过程中,选取发生设定事件的事件报文,生成事件报文对应的事件元数据,并将事件报文及其对应的事件元数据上报给出端流水线223。
除入端事件检测模块2211之外,入端流水线221还包括:一些用于对报文进行接收处理的流水线模块,主要包括图2b中示出的查表模块2212(Tables lookup)。查表模块2212主要用于为接收到的各报文查找路由表,如果查找到该报文对应的路由信息,则可以确定该报文对应的目标出端口。在确定报文对应的目标出端口之后,报文会被拷贝到目标出端口对应的队列中,等待出端流水线223将其从该出端口发送出去。
在查路由表过程中,如果未查找到该报文对应的路由信息,或者,查找到的目标出端口故障,就会丢弃该报文(即发生流水线丢包事件)。另外,在将报文拷贝到目标出端口对应的队列中之前,需要检测目标出端口的工作 状态。出端口的工作状态包括:正常发送状态、暂停发送状态和故障状态。若目标出端口的工作状态处于暂停发送状态,则意味着该报文遇到了暂停事件,无法被及时拷贝到目标出端口对应的队列中。进一步可选地,入端流水线221还可以包括对接收到的报文进行格式等各种校验的校验模块;如果报文未通过校验,会被丢弃(即发生流水线丢包事件);如果报文通过校验,查表模块2212就会为报文进行查表。当然,用于对报文进行校验的校验模块为可选模块,而非必选模块。
进一步,根据报文接收处理的过程中可能发生的事件类型以及设定事件的类型,入端事件检测模块2211可以包括:入端流水线丢包检测模块202和暂停事件检测模块201中的至少一个。
其中,入端流水线丢包检测模块202,用于检测在报文接收处理的过程中是否发生流水线丢包事件,并在为是的情况下,生成事件元数据,将发生流水线丢包事件的报文作为事件报文连通事件元数据一并上报给出端流水线223。其中,报文接收处理的过程由入端流水线221中的流水线(例如校验模块和查表模块2212)执行,故将报文接收处理过程中的丢包事件称为流水线丢包事件。
根据报文接收处理过程的不同,入端流水线丢包检测模块202检测是否发生流水线丢包事件的方式也会有所不同。在一可选实施例中,报文接收处理的过程包括针对接收到的各报文查路由表的过程,则入端流水线丢包检测模块202具体可用于:检测针对接收到的各报文查路由表的过程中是否发生丢包;若检测到查路由表过程中发生丢包,确定发生流水线丢包事件。在另一可选实施例中,报文接收处理的过程包括针对接收到的各报文查路由表的过程和针对接收到的各报文的校验过程,则入端流水线丢包检测模块202具体可用于:检测针对接收到的各报文查路由表的过程中是否发生丢包,并检测针对接收到的各报文的校验过程中是否发生丢包;若检测到任一过程中发生丢包,确定发生流水线丢包事件。在又一可选实施例中,报文接收处理的过程包括针对接收到的各报文的校验过程,则入端流水线丢包检测模块202具体可 用于:检测针对接收到的各报文的校验过程中是否发生丢包;若检测到检验过程中发生丢包,确定发生流水线丢包事件。
其中,暂停事件检测模块201,用于检测在报文接收处理的过程中是否发生暂停事件,并在为是的情况下,生成事件元数据,将发生暂停事件的报文作为事件报文连同事件元数据一并上报给出端流水线223。其中,可以将发生暂停事件的报文复制一份作为事件报文,以降低事件上报对后续报文处理的影响。若网络交换设备20中某个出端口的工作状态处于暂停发送状态,且接收到的报文中有报文还需要路由至该出端口,则认为发生了暂停事件。基于此,暂停事件检测模块201具体用于:在接收到的报文需要被路由至目标出端口的情况下,检测目标出端口是否处于暂停发送状态,若是,确定发生暂停事件。
进一步,如图2b所示,入端流水线221还包括:入端链路丢包检测模块2213,用于检测经过网络交换设备20的数据流是否发生链路丢包事件,并在为是的情况下,向上游设备发送丢包通知消息,以通知上游设备发生了链路丢包事件。
关于链路丢包事件的检测可由网络交换设备20和其上游设备相互配合实现。具体地,上游设备在向网络交换设备20发送报文之前,可为报文添加编号,并在本地缓存报文的编号及其数据流信息一段时间。对网络交换设备20来说,其会接收到带有编号的报文,入端链路丢包检测模块2213具体通过检测来自上游设备的报文的编号是否连续,来判断链路上是否发生丢包;若连续,确定链路上未发生丢包;若不连续,确定链路上发生了丢包,即发生了链路丢包事件。对入端链路丢包检测模块2213来说,通过将接收到的报文的编号进行比较,可以获知丢失报文的编号,但是却无法知道丢失报文具体是谁,也无法知道丢失报文所属的数据流信息,这些信息只有上游设备知道,故入端链路丢包检测模块2213在确定发生链路丢包事件的情况下,可将丢失报文的编号携带在丢包通知消息中一并上报给上游设备,这样上游设备不仅可以确定发生了链路丢包事件,还可以确定发生链路丢包事件的事件报文及 其所属的数据流信息,进而可向数据处理端进行事件上报。
对于进入网络交换设备20的报文,查表模块2212会为报文查找对应的目标出端口,在为报文查找到目标出端口的情况下,报文会被缓存到目标出端口对应的队列中,等待发送。在该缓存过程中,如果目标出端口的队列已满,则该报文会被丢弃(即发生缓存丢包事件)。基于此,如图2b所示,MMU 222的一种实现结构包括:缓存丢包检测模块2221,用于检测在向各出端口对应的队列中缓存报文的过程中是否发生缓存丢包事件,并在为是的情况下,生成事件元数据,将发生缓存丢包事件的报文作为事件报文连通事件元数据一并上报给出端流水线223。
如图2b所示,出端流水线223的一种实现结构包括:事件报文处理模块2232和出端事件检测模块2231。其中,出端事件检测模块2231,主要用于在报文发送处理的过程中,选取发生设定事件的事件报文,并将事件报文及其对应的事件元数据上报给事件报文处理模块2232。事件报文处理模块2232用于接收入端流水线221(具体是入端流水线221中各个入端事件检测模块2211)以及MMU 222(具体是指MMU 222中的缓存丢包检测模块2221)上报的事件报文及其对应的事件元数据,并接收出端事件检测模块2231发送的事件报文及其对应的事件元数据,根据这些事件报文及其对应的事件元数据向数据处理端提供事件信息。
除出端事件检测模块2231之外,出端流水线223还包括:一些用于对报文进行发送处理的流水线模块,例如对待发送的报文进行格式等各种校验的校验模块;如果报文未通过校验,会被丢弃(即发生流水线丢包事件);如果报文通过校验,报文会被发送出去。当然,用于对报文进行校验的校验模块为可选模块,而非必选模块。另外,报文被缓存至目标出端口对应的队列中之后,会等待发送。在等待发送过程中,可能会因为出端口拥塞而被丢包。进一步,在报文被发送出去之后,也可能会发生链路丢包。再者,报文也由可能因为原本链路故障而被重新分配到网络交换设备20所在的链路上,即还可能发生换路事件。
进一步,根据报文发送处理的过程中可能发生的事件类型以及设定事件的类型,如图2b所示,出端事件检测模块2231可以包括:拥塞事件检测模块203、换路事件检测模块204、出端流水线丢包检测模块205和出端链路丢包检测模块206中的至少一个。
拥塞事件检测模块203,用于检测网络交换设备20的各出端口是否发生拥塞事件,并在为是的情况下,生成事件元数据,将发生拥塞事件的报文作为事件报文连同事件元数据一并发送给事件报文处理模块2232。可选地,拥塞事件检测模块203具体用于:针对各出端口,判断该出端口对应的队列中报文的排队延时是否超出设定时延阈值,或者判断该出端口对应的队列的长度是否超出设定的长度阈值;若是,确定该出端口发生了拥塞事件。该出端口上排队的报文即为发生拥塞事件的事件报文。
换路事件检测模块204,用于检测网络交换设备20中是否发生换路事件,并在为是的情况下,生成事件元数据,将发生换路事件的报文作为事件报文连同事件元数据一并发送给事件报文处理模块2232。可选地,换路事件检测模块204具体用于:针对每个待发送的报文,检测该待发送的报文所属的数据流信息(例如五元组或二元组)是否是第一次出现;若是,确定发生换路事件。
出端流水线丢包检测模块205,用于检测在报文发送处理的过程中是否发生流水线丢包事件,并在为是的情况下,生成事件元数据,将发生流水线丢包事件的报文作为事件报文连同事件元数据一并发送给事件报文处理模块2232。在一可选实施例中,报文发送处理的过程包括:对每个待发送的报文进行校验的过程,则出端流水线丢包检测模块205具体用于:检测在对每个待发送的报文进行校验的过程中是否发生丢包,若是,确定发生流水线丢包事件。
出端链路丢包检测模块206,用于检测在报文发送处理的过程中是否发生链路丢包事件,并在为是的情况下,生成事件元数据,将发生链路丢包事件的报文作为事件报文连同事件元数据一并发送给事件报文处理模块2232。
在一可选实施例中,网络交换设备20可与其下游设备相互配完成链路丢包检测。具体地,出端链路丢包检测模块206在将每个待发送的报文发送出去之前,对每个待发送的报文进行编号,以供下游设备根据报文编号协助判断是否发生链路丢包事件;以及检测是否接收到下游设备在确定发生链路丢包事件时返回的丢包通知消息,若是,确定发生链路丢包事件。下游设备会接收到网络交换设备20发送的带有编号的报文,通过判断报文编号是否连续可以确定其与网络交换设备20之间的链路上是否发生丢包。进一步,在发生链路丢包的情况下,下游设备还可以将缺失报文的编号携带在丢包通知消息中一并提供给网络交换设备20中的出端链路丢包检测模块206。出端链路丢包检测模块206具体还用于:在本地缓存每个待发送的报文的编号及其所属的数据流信息;以及根据丢包通知消息中携带的缺失报文的编号,确定发生链路丢包的事件报文及其所属的数据流信息。
如图2b所示,事件报文处理模块2232分别与入端流水线丢包检测模块202、暂停事件检测模块201、缓存丢包检测模块2221、拥塞事件检测模块203、换路事件检测模块204、出端流水线丢包检测模块205和出端链路丢包检测模块206通信连接。入端流水线丢包检测模块202、暂停事件检测模块201、缓存丢包检测模块2221、拥塞事件检测模块203、换路事件检测模块204、出端流水线丢包检测模块205和出端链路丢包检测模块206可通过内部端口(Internal port)将选取出的事件报文及其对应的事件元数据发送给事件报文处理模块2232。
在一可选实施例中,事件报文处理模块2232具体用于:将接收到的事件报文及其对应的事件元数据发送给数据处理端,以供数据处理端从事件报文及其对应的事件元数据中提取事件信息。可选地,事件报文处理模块2232可以直接将接收到的事件报文及其对应的事件元数据发送给数据处理端。或者,事件报文处理模块2232可以对接收到的事件报文进行去冗余处理,得到目标事件报文,将目标事件报文及其对应的事件元数据发送给数据处理端。其中,对事件报文进行去冗余处理,可以在保证事件覆盖率的情况下,进一步降低 事件上报流量,节约流量的传输、处理和存储开销。关于去冗余处理的方式可参见下述实施例中的描述,暂不详述。
在另一可选实施例中,事件报文处理模块2232具体用于:从接收到的事件报文及其对应的事件元数据中提取事件信息,并将事件信息提供给数据处理端。进一步可选地,事件报文处理模块2232可以对接收到的事件报文进行去冗余处理,得到目标事件报文;之后,从目标事件报文及其对应的事件元数据中提取事件信息,将事件信息提供给数据处理端。其中,对事件报文进行去冗余处理,可以在保证事件覆盖率的情况下,进一步降低事件上报流量,节约流量的传输、处理和存储开销。
在本实施例中,并不限定事件报文处理模块2232对事件报文进行去冗余处理所采用的方式,例如可以采用基于哈希的去重方法,或者采用基于精确匹配的去重方法,或者采用本申请实施例提供的分级分组投票去重方法。其中,采用的去重方法不同,事件报文处理模块2232的实现结构就会不同。在本申请实施例中,以采用本申请实施例提供的分级分组投票去重方法为例,给出事件报文处理模块2232的一种实现结构。
在分级分组投票去重方法中,事件报文处理模块2232可以以一条数据流保留一个事件报文为目标,对接收到的事件报文进行去冗余处理,得到目标事件报文。事件报文处理模块2232包括:去冗余子模块,并会维护第一信息表;第一信息表中的每个表项用于记录一条数据流信息及其对应的事件报文数量。
其中,去冗余子模块用于:针对接收到的每个事件报文,将该事件报文所属的数据流信息的哈希值作为索引,在该第一信息表中进行匹配;若未匹配到对应的目标表项,将该事件报文作为目标事件报文,并将该事件报文所属的数据流信息记录到一个空表项中,开始对事件报文数量进行计数;若匹配到对应的目标表项,且目标表项中记录的数据流信息与该事件报文所属的数据流信息相同,则将目标表项对应的事件报文数量加1;若匹配到对应的目标表项,但目标表项中记录的数据流信息与该事件报文所属的数据流信息不 相同,则将目标表项对应的事件报文数量减1;以及若减1后的事件报文数量为0,则将该事件报文作为目标事件报文,并将目标表项中记录的数据流信息替换为该事件报文所属的数据流信息,并重新对事件报文数量进行计数。
进一步,事件报文处理模块2232还包括:事件提取子模块、事件栈以及批处理子模块。其中,事件栈包括栈顶计数器和至少一个栈块。
事件提取子模块,用于从去冗余子模块得到的目标事件报文及其对应的事件元数据中提取事件信息,将事件信息存储至事件栈中的至少一个栈块中。可选地,事件提取子模块具体用于:在事件信息的大小大于栈块的最大位宽时,将事件信息拆分为多个信息块,将多个信息块存储至多个栈块中;每个信息块的大小小于或等于所述最大位宽。事件提取子模块所实现的各操作的详细描述,可参见前述实施例,在此不再赘述。
栈顶计数器,用于记录至少一个栈块中暂存的事件信息的个数。批处理子模块,用于从至少一个栈块中提取指定数量个事件信息,将指定数量个事件信息拼接成一个数据包,将该数据包提供给数据处理端。指定数量可根据数据平面的流水线资源、带宽以及应用场景等因素灵活设定,对此不做限定。例如,指定数量可以是5,8,10等。
可选地,批处理子模块可直接将携带有事件信息的数据包发送给数据处理端;或者,也可以将携带有事件信息的数据包上报给网络交换设备20的控制平面21,由控制平面21将该数据包发送给数据处理端。批处理子模块所实现的各操作的详细描述,可参见前述实施例,在此不再赘述。
进一步,控制平面21在将数据包发送给数据处理端之前,还可以对数据包中携带的事件信息进行去冗余,以在保证事件覆盖率的情况下,进一步降低事件上报流量,节约流量的传输、处理和存储开销。
如图2b所示,控制平面21包括:处理器211和存储器212;存储器212用于存储计算机程序;处理器211执行计算机程序,以用于:对数据包中的事件信息进行去冗余处理,得到新的数据包;并将新的数据包发送给数据处理端。
进一步,处理器211可在本地维护第二信息表,第二信息表用于记录已经 发送给数据处理端的事件信息。基于此,处理器211具体用于:从接收到的数据包中解析出指定数量个事件信息;针对解析出的每个事件信息,检查第二信息表中是否已有相应记录;若是,则丢弃该事件信息;进而,将未被丢弃的事件信息重新封装为新的数据包。进一步,处理器211还用于:将未被丢弃的事件信息记录到第二信息表中,以便对后续接收到的事件信息进行去冗余。
进一步,处理器211还用于对发往网络交换设备新的数据包进行流量整形,以防止突发事件信息产生大量的上送流量冲击网络及数据处理端。关于对事件信息进行去冗余和流量整形的相关描述,可参见前述系统实施例,在此不再赘述。
图3a为本申请示例性实施例提供的一种配置方法的流程示意图。该方法用于对上述实施例提供的网络交换设备进行配置,主要用于对网络交换设备中可编程的数据平面进行功能配置。如图3a所示,该方法包括以下步骤:
31a、响应于配置操作,获取网络交换设备中可编程的数据平面所需的配置文件。
32a、将上述配置文件配置至可编程的数据平面中,以完成配置操作;其中,可编程的数据平面被配置为:从经过网络交换设备的数据流中,选取发生设定事件的事件报文;基于事件报文向数据处理端提供事件信息;所述事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题。
在本实施例中,可以使用各种硬件编程语音来生成数据平面所需的配置文件。例如,可以采用但不限于:和编译工具等,例如可以采用P4(英文为programming protocol-independent packet processor)语言,P4语言是一种主要用于数据平面的编程语言。在生成配置文件之后,可以通过数据平面支持的接口将配置文件上传至数据平面中。
在本实施例中,网络交换设备的数据平面是可编程的,网络用户可以根据自己的应用需求自定义数据平面的功能,实现与协议无关的网络数据处理流程。关于数据平面被编成后所具有的功能,可参见前述实施例的描述,在 此不再赘述。
图3b为本申请示例性实施例提供的一种信息处理方法的流程示意图。该方法适用于图2a-2b所示实施例中的网络交换设备,具体适用于网络交换设备中可编程的数据平面,但并不限于前述实施例中的可编程数据平面。该方法同样适用于一些具有与前述实施例中可编程数据平面相同或类似功能的不可编程数据平面。如图3b所示,该方法包括:
31b、从经过网络交换设备的数据流中,选取发生设定事件的事件报文;
32b、基于事件报文向数据处理端提供事件信息,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题。
其中,事件信息用于描述发生设定事件的相关信息,可供定位与设定事件相关的网络问题(例如故障位置或设备)。在本实施例中,并不对事件信息的内容进行限定,凡是能够描述发生设定事件的相关信息均适用于本申请实施例。例如,事件信息可以包括以下至少一种:设定事件的类型、设定事件的详情信息以及发生设定事件的数据流信息。其中,发生设定事件的数据流信息可以是任何能够反映发生设定事件数据流的信息,例如可以是报文的五元组或二元组等信息。设定事件的详情信息包括但不限于:发生设定事件的原因、发生设定事件的位置(例如端口、队列等)、设定事件发生后引起的结果、发生设定事件的时间等。根据事件类型的不同,发生设定事件的详情信息也会有所不同。
在本实施例中,并不对设定事件进行限定,可以是任何与网络故障有关的事件,具体可根据监控需求、系统特性、系统中的应用特点等因素灵活设定。在一可选实施例中,上述设定事件包括以下至少一种类型:拥塞事件、暂停事件、丢包事件以及换路事件。关于拥塞事件、暂停事件、丢包事件以及换路事件的定义和说明,可参见前述实施例,在此不再赘述。
在一可选实施例中,上述从经过网络交换设备的数据流中,选取发生设定事件的事件报文,包括以下至少一种选取操作:
在对经过网络交换设备的数据流进行报文接收处理的过程中,选取发生 设定事件的事件报文并生成事件报文对应的事件元数据;
在对经过网络交换设备的数据流进行报文交换处理的过程中,选取发生设定事件的事件报文并生成事件报文对应的事件元数据;
在对经过网络交换设备的数据流进行报文发送处理的过程中,选取发生设定事件的事件报文并生成事件报文对应的事件元数据。
进一步,上述在对经过网络交换设备的数据流进行报文接收处理的过程中,选取发生设定事件的事件报文,包括以下至少一种操作:
检测在报文接收处理的过程中是否发生流水线丢包事件,并在为是的情况下,将发生流水线丢包事件的报文作为事件报文;
检测在报文接收处理的过程中是否发生暂停事件,并在为是的情况下,将发生暂停事件的报文作为事件报文。可选地,可以将发生暂停事件的报文复制一份作为事件报文,以降低事件上报对后续报文处理的影响。
进一步,上述在对经过网络交换设备的数据流进行报文交换处理的过程中,选取发生设定事件的事件报文,包括:检测在向网络交换设备的多个出端口对应的队列中缓存报文的过程中是否发生缓存丢包事件,在为是的情况下,将发生缓存丢包事件的报文作为事件报文。
进一步,上述在对经过网络交换设备的数据流进行报文发送处理的过程中,选取发生设定事件的事件报文,包括以下至少一种操作:
检测网络交换设备的各出端口是否发生拥塞事件,并在为是的情况下,将发生拥塞事件的报文作为事件报文;
检测网络交换设备中是否发生换路事件,并在为是的情况下,将发生换路事件的报文作为事件报文;
检测在报文发送处理的过程中是否发生流水线丢包事件,并在为是的情况下,将发生流水线丢包事件的报文作为事件报文;
检测在报文发送处理的过程中是否发生链路丢包事件,并在为是的情况下,将发生链路丢包事件的报文作为事件报文。
在一可选实施例中,基于事件报文向数据处理端提供事件信息,包括: 将事件报文及其对应的事件元数据发送给数据处理端,以供数据处理端从事件报文及其对应的事件元数据中提取事件信息;或者,从事件报文及其对应的事件元数据中提取事件信息,并将事件信息提供给数据处理端。
进一步可选地,在将事件报文及其对应的事件元数据发送给数据处理端之前,或者在从事件报文及其对应的事件元数据中提取事件信息之前,该方法还包括:以一条数据流保留一个事件报文为目标,对事件报文进行去冗余处理,得到目标事件报文。
在一可选实施例中,对事件报文进行去冗余处理,得到目标事件报文,包括:针对每个事件报文,将该事件报文所属的数据流信息的哈希值作为索引,在第一信息表中进行匹配;第一信息表中的每个表项用于记录一条数据流信息及其对应的事件报文数量;若未匹配到对应的目标表项,将事件报文作为目标事件报文,并将该事件报文所属的数据流信息记录到一个空表项中,开始对事件报文数量进行计数;若匹配到对应的目标表项,且目标表项中记录的数据流信息与该事件报文所属的数据流信息相同,则将目标表项对应的事件报文数量加1;若匹配到对应的目标表项,但目标表项中记录的数据流信息与事件报文所属的数据流信息不相同,则将目标表项对应的事件报文数量减1;以及若减1后的事件报文数量为0,则将该事件报文作为目标事件报文,并将目标表项中记录的数据流信息替换为该事件报文所属的数据流信息,并重新对事件报文数量进行计数。
在一可选实施例中,从事件报文及其对应的事件元数据中提取事件信息之后,所述方法还包括:将事件信息存储至事件栈中的至少一个栈块中。相应地,将事件信息提供给数据处理端,包括:从至少一个栈块中提取指定数量个事件信息,将指定数量个事件信息拼接成一个数据包,将数据包提供给数据处理端。
在一可选实施例中,将数据包提供给数据处理端,包括:数据平面直接将数据包发送给数据处理端;或者数据平面将数据包上报给控制平面,以供控制平面将数据包发送给数据处理端。
在一可选实施例中,所述方法还包括:控制平面对数据包中的事件信息进行去冗余处理,得到新的数据包;控制平面将新的数据包发送给数据处理端。
进一步可选地,控制平面对数据包中的事件信息进行去冗余处理,得到新的数据包,包括:从数据包中解析出指定数量个事件信息;针对解析出的每个事件信息,检查第二信息表中是否已有相应记录;若是,则丢弃事件信息;将未被丢弃的事件信息重新封装为新的数据包;其中,第二信息表记录有已经发送给数据处理端的事件信息。
进一步可选地,所述方法还包括:控制平面在发送新的数据包过程中,对新的数据包进行流量整形。
在本实施例中,网络交换设备具有可编程的数据平面,利用数据平面的可编程性,使能数据平面准确、及时地选取事件报文,并基于事件报文精准、快速地向数据处理端上报事件信息,数据处理端保存事件信息,以事件信息为基础面向网络管理员提供查询操作,为网络管理员准确、快速地定位网络问题提供了基础,可解决网络问题定位准确度差、速度慢等问题。
图4a为本申请示例性实施例提供的另一种信息处理方法的流程示意图。该方法适用于数据处理端。如图4a所示,该方法包括:
41a、接收网络交换设备发送的事件信息,事件信息用于描述经过网络交换设备的数据流发生设定事件的相关信息;
42a、保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
关于网络交换设备如何从数据流中选取事件报文以及如何从事件报文中提取事件信息的内容,可参见前述实施例,在本实施例中不做详述。
在一可选实施例中,事件信息包括以下至少一种:设定事件的类型、设定事件的详情信息以及发生设定事件的数据流信息。
在一可选实施例中,上述查询操作包括以下至少一种:数据流维度的查询操作、事件维度的查询操作、设备维度的查询操作以及时间维度的查询操 作。
在一可选实施例中,接收网络交换设备发送的事件信息,包括:接收网络交换设备发送的数据包;从数据包解析出多个事件信息。通过数据包携带多个事件信息,可实现对事件信息的批处理,有利于减少数据传输量,有利于提高数据处理设备的吞吐量。
在一可选实施例中,在保存事件信息之前,还包括:对事件信息进行去冗余处理。这可减少事件信息的冗余,节约存储资源。
在本实施例中,以事件信息为基础面向网络管理员提供查询操作,为网络管理员准确、快速地定位网络问题提供了基础,可解决网络问题定位准确度差、速度慢等问题。
图4b为本申请示例性实施例提供的又一种信息处理方法的流程示意图。该方法适用于数据处理端。如图4b所示,该方法包括:
41b、接收网络交换设备发送的事件报文及其对应的事件元数据,事件报文是经过网络交换设备的数据流中发生设定事件的报文;
42b、从事件报文及其对应的事件元数据中提取事件信息,事件信息用于描述发生设定事件的相关信息;
43b、保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
关于网络交换设备如何从数据流中选取事件报文的内容,可参见前述实施例,在本实施例中不做详述。
在一可选实施例中,在从事件报文及其对应的事件元数据中提取事件信息之前,还包括:以一条数据流保留一个事件报文为目标,对事件报文进行去冗余处理,得到目标事件报文。相应地,从事件报文及其对应的事件元数据中提取事件信息,具体为:从目标事件报文及其对应的事件元数据中提取事件信息。由于数据处理端的处理能力较为强大,此处去冗余处理可以采用多种方法,例如基于哈希的去重方法,基于精准匹配的去重方法,等等。
在一可选实施例中,在保存事件信息之前,还包括:对事件信息进行去 冗余处理。同理,由于数据处理端的处理能力较为强大,此处去冗余处理可以采用多种方法,例如基于哈希的去重方法,基于精准匹配的去重方法,等等。
需要说明的是,考虑到数据处理端的处理能力较为强大,如果能够彻底对事件报文进行去冗余,则可以无需执行对事件信息进行去冗余的操作。当然,也可以不执行对事件报文进行去冗余的操作,仅执行对事件信息进行去冗余的操作。当然,两个去冗余操作均执行,同样适用于本申请实施例。
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤41b至步骤43b的执行主体可以为设备A;又比如,步骤41b和42b的执行主体可以为设备A,步骤43b的执行主体可以为设备B;等等。
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如41b、42b等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
图5a为本申请示例性实施例提供的一种数据处理设备的结构示意图。如图5a所示,该设备包括:存储器51a、处理器52a以及通信组件53a。
存储器51a,用于存储计算机程序,并可被配置为存储其它各种数据以支持在数据处理设备上的操作。这些数据的示例包括用于在数据处理设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。
存储器51a可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器 (PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
处理器52a,与存储器51a耦合,用于执行存储器51a中的计算机程序,以用于:通过通信组件53a接收网络交换设备发送的事件信息,事件信息用于描述经过网络交换设备的数据流发生设定事件的相关信息;保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
在一可选实施例中,事件信息包括以下至少一种:设定事件的类型、设定事件的详情信息以及发生设定事件的数据流信息。关于这些信息的详细说明,可参见前述实施例。
在一可选实施例中,查询操作包括以下至少一种:数据流维度的查询操作、事件维度的查询操作、设备维度的查询操作以及时间维度的查询操作。
在一可选实施例中,处理器52a在接收网络交换设备发送的事件信息时,具体用于:接收网络交换设备发送的数据包;从数据包解析出多个事件信息。通过数据包携带多个事件信息,可实现对事件信息的批处理,有利于减少数据传输量,有利于提高数据处理设备的吞吐量。
在一可选实施例中,处理器52a在保存事件信息之前,还用于:对事件信息进行去冗余处理。这可减少事件信息的冗余,节约存储资源。
进一步,如图5a所示,该数据处理设备还包括:显示器57a、电源组件58a、音频组件59a等其它组件。图5a中仅示意性给出部分组件,并不意味着数据处理设备只包括图5a所示组件。另外,图5a中虚线框内的组件为可选组件,而非必选组件,具体可视数据处理设备的产品形态而定。本实施例的数据处理设备可以实现为台式电脑、笔记本电脑、智能手机等终端设备,也可以是常规服务器、云服务器或服务器阵列等服务端设备。若本实施例的数据处理设备实现为台式电脑、笔记本电脑、智能手机等终端设备,可以包含图5a中虚线框内的组件;若本实施例的数据处理设备实现为常规服务器、云服务器或服务器阵列等服务端设备,则可以不包含图5a中虚线框内的组件。
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储 介质,计算机程序被执行时能够实现图4a所示方法实施例中的各步骤。
图5b为本申请示例性实施例提供的另一种数据处理设备的结构示意图。如图5b所示,该设备包括:存储器51b、处理器52b以及通信组件53b。
存储器51b,用于存储计算机程序,并可被配置为存储其它各种数据以支持在数据处理设备上的操作。这些数据的示例包括用于在数据处理设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。
存储器51b可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
处理器52b,与存储器51b耦合,用于执行存储器51b中的计算机程序,以用于:通过通信组件53b接收网络交换设备发送的事件报文及其对应的事件元数据,事件报文是经过网络交换设备的数据流中发生设定事件的报文;从事件报文及其对应的事件元数据中提取事件信息,事件信息用于描述发生设定事件的相关信息;保存事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与设定事件相关的网络问题。
在一可选实施例中,查询操作包括以下至少一种:数据流维度的查询操作、事件维度的查询操作、设备维度的查询操作以及时间维度的查询操作。
在一可选实施例中,处理器52b在从事件报文中提取事件信息之前,还用于:以一条数据流保留一个事件报文为目标,对事件报文进行去冗余处理,得到目标事件报文。相应地,处理器52b在从事件报文及其对应的事件元数据中提取事件信息时,具体用于:从目标事件报文及其对应的事件元数据中提取事件信息。
在一可选实施例中,处理器52b在保存事件信息之前,还用于:对事件信息进行去冗余处理。
进一步,如图5b所示,该数据处理设备还包括:显示器57b、电源组件58b、 音频组件59b等其它组件。图5b中仅示意性给出部分组件,并不意味着数据处理设备只包括图5b所示组件。另外,图5b中虚线框内的组件为可选组件,而非必选组件,具体可视数据处理设备的产品形态而定。本实施例的数据处理设备可以实现为台式电脑、笔记本电脑、智能手机等终端设备,也可以是常规服务器、云服务器或服务器阵列等服务端设备。若本实施例的数据处理设备实现为台式电脑、笔记本电脑、智能手机等终端设备,可以包含图5b中虚线框内的组件;若本实施例的数据处理设备实现为常规服务器、云服务器或服务器阵列等服务端设备,则可以不包含图5b中虚线框内的组件。
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被执行时能够实现图4b所示方法实施例中的各步骤。
上述图5a和图5b中的通信组件被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频选取(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
上述图5a和图5b中的显示器包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。
上述图5a和图5b中的电源组件,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的组件。
上述图5a和图5b中的音频组件,可被配置为输出和/或输入音频信号。例 如,音频组件包括一个麦克风(MIC),当音频组件所在设备处于操作模式,如呼叫模式、记录模式和语音选取模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器或经由通信组件发送。在一些实施例中,音频组件还包括一个扬声器,用于输出音频信号。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输 出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (56)

  1. 一种网络交换设备,其特征在于,包括:可编程的数据平面;所述可编程的数据平面被编程,以用于:
    从经过所述网络交换设备的数据流中,选取发生设定事件的事件报文;基于所述事件报文向数据处理端提供事件信息,所述事件信息用于描述发生所述设定事件的相关信息,可供定位与所述设定事件相关的网络问题。
  2. 根据权利要求1所述的设备,其特征在于,所述事件信息包括以下至少一种:所述设定事件的类型、所述设定事件的详情信息以及发生所述设定事件的数据流信息。
  3. 根据权利要求2所述的设备,其特征在于,所述设定事件包括以下至少一种类型:拥塞事件、暂停事件、丢包事件以及换路事件。
  4. 根据权利要求1所述的设备,其特征在于,所述可编程的数据平面为流水线结构。
  5. 根据权利要求4所述的设备,其特征在于,所述可编程的数据平面依次包括:入端流水线、缓存管理单元和出端流水线;
    所述入端流水线,用于在对经过所述网络交换设备的数据流进行报文接收处理的过程中,选取发生设定事件的事件报文,并将所述事件报文及其对应的事件元数据上报给所述出端流水线;
    所述缓存管理单元,用于在对经过所述网络交换设备的数据流进行报文交换处理的过程中,选取发生设定事件的事件报文,并将所述事件报文及其对应的事件元数据上报给所述出端流水线;
    所述出端流水线,用于在对经过所述网络交换设备的数据流进行报文发送处理的过程中,选取发生设定事件的事件报文,并根据本端选取出的事件报文及其对应的事件元数据和所述入端流水线以及所述缓存管理单元上报的事件报文及其对应的事件元数据向数据处理端上报事件信息。
  6. 根据权利要求5所述的设备,其特征在于,所述入端流水线包括以下 至少一个入端事件检测模块:
    入端流水线丢包检测模块,用于检测在所述报文接收处理的过程中是否发生流水线丢包事件,并在为是的情况下,生成事件元数据,将发生流水线丢包事件的报文作为事件报文连同所述事件元数据一并上报给所述出端流水线;
    暂停事件检测模块,用于检测在所述报文接收处理的过程中是否发生暂停事件,并在为是的情况下,生成事件元数据,将发生暂停事件的报文作为事件报文连同所述事件元数据一并上报给所述出端流水线。
  7. 根据权利要求6所述的设备,其特征在于,所述报文接收处理的过程包括针对接收到的各报文查路由表的过程和/或校验过程;
    所述入端流水线丢包检测模块具体用于:检测针对接收到的各报文查路由表的过程和/或校验过程中是否发生丢包,若是,确定发生流水线丢包事件。
  8. 根据权利要求6所述的设备,其特征在于,所述暂停事件检测模块具体用于:在接收到的报文需要被路由至目标出端口的情况下,检测所述目标出端口是否处于暂停发送状态,若是,确定发生暂停事件。
  9. 根据权利要求6所述的设备,其特征在于,所述入端流水线还包括:
    入端链路丢包检测模块,用于检测经过所述网络交换设备的数据流是否发生链路丢包事件,并在为是的情况下,向上游设备发送丢包通知消息,以通知上游设备发生了链路丢包事件。
  10. 根据权利要求9所述的设备,其特征在于,所述入端链路丢包检测模块具体用于:检测来自上游设备的报文的编号是否连续;若不连续,确定发生链路丢包事件。
  11. 根据权利要求5-10任一项所述的设备,其特征在于,所述缓存管理单元包括:缓存丢包检测模块;
    所述缓存丢包检测模块,用于检测在向所述网络交换设备的多个出端口对应的队列中缓存报文的过程中是否发生缓存丢包事件,并在为是的情况下,生成事件元数据,将发生缓存丢包事件的报文作为事件报文连同所述事件元 数据一并上报给所述出端流水线。
  12. 根据权利要求5-10任一项所述的设备,其特征在于,所述出端流水线包括:事件报文处理模块和以下至少一个出端事件检测模块:
    拥塞事件检测模块,用于检测所述网络交换设备的各出端口是否发生拥塞事件,并在为是的情况下,生成事件元数据,将发生拥塞事件的报文作为事件报文连同所述事件元数据一并发送给所述事件报文处理模块;
    换路事件检测模块,用于检测所述网络交换设备中是否发生换路事件,并在为是的情况下,生成事件元数据,将发生换路事件的报文作为事件报文连同所述事件元数据一并发送给所述事件报文处理模块;
    出端流水线丢包检测模块,用于检测在所述报文发送处理的过程中是否发生流水线丢包事件,并在为是的情况下,生成事件元数据,将发生流水线丢包事件的报文作为事件报文连同所述事件元数据一并发送给所述事件报文处理模块;
    出端链路丢包检测模块,用于检测在所述报文发送处理的过程中是否发生链路丢包事件,并在为是的情况下,生成事件元数据,将发生链路丢包事件的报文作为事件报文连同所述事件元数据一并发送给所述事件报文处理模块;
    所述事件报文处理模块,用于根据接收到的事件报文和所述事件元数据,向所述数据处理端提供事件信息。
  13. 根据权利要求12所述的设备,其特征在于,所述拥塞事件检测模块具体用于:针对各出端口,判断所述出端口对应的队列中报文的排队延时是否超出设定时延阈值,或者判断所述出端口对应的队列的长度是否超出设定的长度阈值;若是,确定所述出端口发生了拥塞事件。
  14. 根据权利要求12所述的设备,其特征在于,所述换路事件检测模块具体用于:针对每个待发送的报文,检测所述待发送的报文所属的数据流信息是否是第一次出现;若是,确定发生换路事件。
  15. 根据权利要求12所述的设备,其特征在于,所述报文发送处理的过 程包括对每个待发送的报文进行校验的过程;
    所述出端流水线丢包检测模块具体用于:检测在对每个待发送的报文进行校验的过程中是否发生丢包,若是,确定发生流水线丢包事件。
  16. 根据权利要求12所述的设备,其特征在于,所述出端链路丢包检测模块具体用于:在将每个待发送的报文发送出去之前,对每个待发送的报文进行编号,以供下游设备根据报文编号协助判断是否发生链路丢包事件;以及
    检测是否接收到下游设备在确定发生链路丢包事件时返回的丢包通知消息,若是,确定发生链路丢包事件。
  17. 根据权利要求16所述的设备,其特征在于,所述出端链路丢包检测模块具体用于:
    在本地缓存每个待发送的报文的编号及其所属的数据流信息;以及
    根据所述丢包通知消息中携带的缺失报文的编号,确定发生链路丢包的事件报文及其所属的数据流信息。
  18. 根据权利要求12所述的设备,其特征在于,所述事件报文处理模块具体用于:
    将接收到的事件报文及其对应的事件元数据发送给所述数据处理端,以供所述数据处理端从所述事件报文及其对应的事件元数据中提取事件信息;
    或者
    从接收到的事件报文及其对应的事件元数据中提取事件信息,并将所述事件信息提供给所述数据处理端。
  19. 根据权利要求18所述的设备,其特征在于,所述事件报文处理模块还用于:在将接收到的事件报文及其对应的事件元数据发送给所述数据处理端之前,或者在从接收到的事件报文及其对应的事件元数据中提取事件信息之前,以一条数据流保留一个事件报文为目标,对接收到的事件报文进行去冗余处理,得到目标事件报文。
  20. 根据权利要求19所述的设备,其特征在于,所述事件报文处理模块 包括:去冗余子模块和第一信息表,所述第一信息表中的每个表项用于记录一条数据流信息及其对应的事件报文数量;
    所述去冗余子模块用于:针对接收到的每个事件报文,将所述事件报文所属的数据流信息的哈希值作为索引,在所述第一信息表中进行匹配;
    若未匹配到对应的目标表项,将所述事件报文作为目标事件报文,并将所述事件报文所属的数据流信息记录到一个空表项中,开始对事件报文数量进行计数;
    若匹配到对应的目标表项,且目标表项中记录的数据流信息与所述事件报文所属的数据流信息相同,则将目标表项对应的事件报文数量加1;
    若匹配到对应的目标表项,但目标表项中记录的数据流信息与所述事件报文所属的数据流信息不相同,则将目标表项对应的事件报文数量减1;以及若减1后的事件报文数量为0,则将所述事件报文作为目标事件报文,并将目标表项中记录的数据流信息替换为所述事件报文所属的数据流信息,并重新对事件报文数量进行计数。
  21. 根据权利要求20所述的设备,其特征在于,所述事件报文处理模块还包括:事件提取子模块、事件栈以及批处理子模块,所述事件栈包括栈顶计数器和至少一个栈块;
    所述事件提取子模块,用于从所述目标事件报文及其对应的事件元数据中提取事件信息,将所述事件信息存储至所述至少一个栈块中;
    所述栈顶计数器,用于记录所述至少一个栈块中暂存的事件信息的个数;
    所述批处理子模块,用于从所述至少一个栈块中提取指定数量个事件信息,将所述指定数量个事件信息拼接成一个数据包,将所述数据包提供给所述数据处理端。
  22. 根据权利要求21所述的设备,其特征在于,所述事件提取子模块具体用于:从所述目标事件报文中提取发生所述设定事件的数据流信息,并从所述目标事件报文对应的事件元数据中提取所述设定事件的类型和所述设定事件的详情信息,作为所述事件信息。
  23. 根据权利要求21所述的设备,其特征在于,所述事件提取子模块具体用于:在所述事件信息的大小大于所述栈块的最大位宽时,将所述事件信息拆分为多个信息块,将所述多个信息块存储至多个栈块中;每个信息块的大小小于或等于所述最大位宽。
  24. 根据权利要求21所述的设备,其特征在于,还包括:控制平面;
    所述批处理子模块具体用于:
    直接将所述数据包发送给所述数据处理端;
    或者
    将所述数据包上报给所述控制平面,以供所述控制平面将所述数据包发送给所述数据处理端。
  25. 根据权利要求24所述的设备,其特征在于,所述控制平面包括:处理器和存储器;
    所述存储器,用于存储计算机程序,所述处理器执行所述计算机程序,以用于:
    对所述数据包中的事件信息进行去冗余处理,得到新的数据包;并将新的数据包发送给所述数据处理端。
  26. 根据权利要求25所述的设备,其特征在于,所述处理器具体用于:
    从所述数据包中解析出所述指定数量个事件信息;
    针对解析出的每个事件信息,检查第二信息表中是否已有相应记录;若是,则丢弃所述事件信息;
    将未被丢弃的事件信息重新封装为新的数据包;其中,所述第二信息表记录有已经发送给所述数据处理端的事件信息。
  27. 根据权利要求26所述的设备,其特征在于,所述处理器还用于:将未被丢弃的事件信息记录到所述第二信息表中。
  28. 根据权利要求26所述的设备,其特征在于,所述处理器还用于:
    在发送所述新的数据包过程中,对所述新的数据包进行流量整形。
  29. 一种信息处理方法,适用于网络交换设备,其特征在于,所述网络交换设备具有可编程的数据平面,所述方法由被编程后的数据平面实现,所述方法包括:
    从经过所述网络交换设备的数据流中,选取发生设定事件的事件报文;
    基于所述事件报文向数据处理端提供事件信息,所述事件信息用于描述发生所述设定事件的相关信息,可供定位与所述设定事件相关的网络问题。
  30. 根据权利要求29所述的方法,其特征在于,所述事件信息包括以下至少一种:所述设定事件的类型、所述设定事件的详情信息以及发生所述设定事件的数据流信息。
  31. 根据权利要求30所述的方法,其特征在于,所述设定事件包括以下至少一种类型:拥塞事件、暂停事件、丢包事件以及换路事件。
  32. 根据权利要求29所述的方法,其特征在于,从经过所述网络交换设备的数据流中,选取发生设定事件的事件报文,包括以下至少一种选取操作:
    在对经过所述网络交换设备的数据流进行报文接收处理的过程中,选取发生设定事件的事件报文并生成所述事件报文对应的事件元数据;
    在对经过所述网络交换设备的数据流进行报文交换处理的过程中,选取发生设定事件的事件报文并生成所述事件报文对应的事件元数据;
    在对经过所述网络交换设备的数据流进行报文发送处理的过程中,选取发生设定事件的事件报文并生成所述事件报文对应的事件元数据。
  33. 根据权利要求32所述的方法,其特征在于,在对经过所述网络交换设备的数据流进行报文接收处理的过程中,选取发生设定事件的事件报文,包括以下至少一种操作:
    检测在所述报文接收处理的过程中是否发生流水线丢包事件,并在为是的情况下,将发生流水线丢包事件的报文作为事件报文;
    检测在所述报文接收处理的过程中是否发生暂停事件,并在为是的情况下,将发生暂停事件的报文作为事件报文。
  34. 根据权利要求32所述的方法,其特征在于,在对经过所述网络交换 设备的数据流进行报文交换处理的过程中,选取发生设定事件的事件报文,包括:
    检测在向所述网络交换设备的多个出端口对应的队列中缓存报文的过程中是否发生缓存丢包事件,在为是的情况下,将发生缓存丢包事件的报文作为事件报文。
  35. 根据权利要求32所述的方法,其特征在于,在对经过所述网络交换设备的数据流进行报文发送处理的过程中,选取发生设定事件的事件报文,包括以下至少一种操作:
    检测所述网络交换设备的各出端口是否发生拥塞事件,并在为是的情况下,将发生拥塞事件的报文作为事件报文;
    检测所述网络交换设备中是否发生换路事件,并在为是的情况下,将发生换路事件的报文作为事件报文;
    检测在所述报文发送处理的过程中是否发生流水线丢包事件,并在为是的情况下,将发生流水线丢包事件的报文作为事件报文;
    检测在所述报文发送处理的过程中是否发生链路丢包事件,并在为是的情况下,将发生链路丢包事件的报文作为事件报文。
  36. 根据权利要求32-35任一项所述的方法,其特征在于,基于所述事件报文向数据处理端提供事件信息,包括:
    将所述事件报文及其对应的事件元数据发送给所述数据处理端,以供所述数据处理端从所述事件报文及其对应的事件元数据中提取事件信息;
    或者
    从所述事件报文及其对应的事件元数据中提取事件信息,并将所述事件信息提供给所述数据处理端。
  37. 根据权利要求36所述的方法,其特征在于,在将所述事件报文及其对应的事件元数据发送给所述数据处理端之前,或者在从所述事件报文及其对应的事件元数据中提取事件信息之前,所述方法还包括:
    以一条数据流保留一个事件报文为目标,对事件报文进行去冗余处理, 得到目标事件报文。
  38. 根据权利要求37所述的方法,其特征在于,以一条数据流保留一个事件报文为目标,对事件报文进行去冗余处理,得到目标事件报文,包括:
    针对每个事件报文,将所述事件报文所属的数据流信息的哈希值作为索引,在第一信息表中进行匹配;所述第一信息表中的每个表项用于记录一条数据流信息及其对应的事件报文数量;
    若未匹配到对应的目标表项,将所述事件报文作为目标事件报文,并将所述事件报文所属的数据流信息记录到一个空表项中,开始对事件报文数量进行计数;
    若匹配到对应的目标表项,且目标表项中记录的数据流信息与所述事件报文所属的数据流信息相同,则将目标表项对应的事件报文数量加1;
    若匹配到对应的目标表项,但目标表项中记录的数据流信息与所述事件报文所属的数据流信息不相同,则将目标表项对应的事件报文数量减1;以及若减1后的事件报文数量为0,则将所述事件报文作为目标事件报文,并将目标表项中记录的数据流信息替换为所述事件报文所属的数据流信息,并重新对事件报文数量进行计数。
  39. 根据权利要求38所述的方法,其特征在于,从所述事件报文及其对应的事件元数据中提取事件信息之后,还包括:将所述事件信息存储至事件栈中的至少一个栈块中;
    将所述事件信息提供给所述数据处理端,包括:
    从所述至少一个栈块中提取指定数量个事件信息,将所述指定数量个事件信息拼接成一个数据包,将所述数据包提供给所述数据处理端。
  40. 根据权利要求39所述的方法,其特征在于,将所述数据包提供给所述数据处理端,包括:
    直接将所述数据包发送给所述数据处理端;
    或者
    将所述数据包上报给所述网络交换设备的控制平面,以供所述控制平面 将所述数据包发送给所述数据处理端。
  41. 根据权利要求40所述的方法,其特征在于,还包括:
    所述控制平面对所述数据包中的事件信息进行去冗余处理,得到新的数据包;
    所述控制平面将新的数据包发送给所述数据处理端。
  42. 根据权利要求41所述的方法,其特征在于,所述控制平面对所述数据包中的事件信息进行去冗余处理,得到新的数据包,包括:
    从所述数据包中解析出所述指定数量个事件信息;
    针对解析出的每个事件信息,检查第二信息表中是否已有相应记录;若是,则丢弃所述事件信息;
    将未被丢弃的事件信息重新封装为新的数据包;其中,所述第二信息表记录有已经发送给所述数据处理端的事件信息。
  43. 根据权利要求42所述的方法,其特征在于,还包括:
    所述控制平面在发送所述新的数据包过程中,对所述新的数据包进行流量整形。
  44. 一种信息处理方法,适用于数据处理端,其特征在于,所述方法包括:
    接收网络交换设备发送的事件信息,所述事件信息用于描述经过所述网络交换设备的数据流发生设定事件的相关信息;
    保存所述事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与所述设定事件相关的网络问题。
  45. 根据权利要求44所述的方法,其特征在于,所述事件信息包括以下至少一种:所述设定事件的类型、所述设定事件的详情信息以及发生所述设定事件的数据流信息。
  46. 根据权利要求45所述的方法,其特征在于,所述查询操作包括以下至少一种:数据流维度的查询操作、事件维度的查询操作、设备维度的查询 操作以及时间维度的查询操作。
  47. 根据权利要求44所述的方法,其特征在于,接收网络交换设备发送的事件信息,包括:
    接收网络交换设备发送的数据包;
    从所述数据包解析出多个所述事件信息。
  48. 根据权利要求44-47任一项所述的方法,其特征在于,在保存所述事件信息之前,还包括:对所述事件信息进行去冗余处理。
  49. 一种信息处理方法,适用于数据处理端,其特征在于,所述方法包括:
    接收网络交换设备发送的事件报文及其对应的事件元数据,所述事件报文是经过所述网络交换设备的数据流中发生设定事件的报文;
    从所述事件报文及其对应的事件元数据中提取事件信息,所述事件信息用于描述发生所述设定事件的相关信息;
    保存所述事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与所述设定事件相关的网络问题。
  50. 根据权利要求49所述的方法,其特征在于,在从所述事件报文及其对应的事件元数据中提取事件信息之前,还包括:
    以一条数据流保留一个事件报文为目标,对所述事件报文进行去冗余处理,得到目标事件报文;
    从所述事件报文及其对应的事件元数据中提取事件信息,具体为:从所述目标事件报文及其对应的事件元数据中提取事件信息。
  51. 根据权利要求49或50所述的方法,其特征在于,在保存所述事件信息之前,还包括:对所述事件信息进行去冗余处理。
  52. 一种数据处理设备,其特征在于,包括:存储器、处理器以及通信组件;
    所述存储器,用于存储计算机程序;
    所述处理器,与所述存储器耦合,用于执行所述计算机程序,以用于:
    通过所述通信组件接收网络交换设备发送的事件信息,所述事件信息用于描述经过所述网络交换设备的数据流发生设定事件的相关信息;
    保存所述事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与所述设定事件相关的网络问题。
  53. 一种数据处理设备,其特征在于,包括:存储器、处理器以及通信组件;
    所述存储器,用于存储计算机程序;
    所述处理器,与所述存储器耦合,用于执行所述计算机程序,以用于:
    通过所述通信组件接收网络交换设备发送的事件报文及其对应的事件元数据,所述事件报文是经过所述网络交换设备的数据流中发生设定事件的报文;
    从所述事件报文及其对应的事件元数据中提取事件信息,所述事件信息用于描述发生所述设定事件的相关信息;
    保存所述事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与所述设定事件相关的网络问题。
  54. 一种存储有计算机程序的计算机可读存储介质,其特征在于,当所述计算机程序被处理器执行时,致使所述处理器实现权利要求44-51任一项所述方法中的步骤。
  55. 一种配置方法,适用于网络交换设备,其特征在于,所述网络交换设备包括可编程的数据平面;所述配置方法包括:
    响应于配置操作,获取所述可编程的数据平面所需的配置文件;
    将所述配置文件配置至所述可编程的数据平面中,以完成配置操作;
    其中,所述可编程的数据平面被配置为:从经过所述网络交换设备的数据流中,选取发生设定事件的事件报文;基于所述事件报文向数据处理端提 供事件信息,所述事件信息用于描述发生所述设定事件的相关信息,可供定位与所述设定事件相关的网络问题。
  56. 一种数据中心系统,其特征在于,包括:多台服务器、多台网络交换设备以及数据处理设备;所述多台服务器与所述数据处理设备分别与所述多台网络交换设备通信连接;
    所述多台网络交换设备中至少部分网络交换设备包括可编程的数据平面,且所述可编程的数据平面被编程,可用于:
    从经过所述可编程的数据平面所属的网络交换设备的数据流中,选取发生设定事件的事件报文;基于所述事件报文向所述数据处理端提供事件信息,所述事件信息用于描述发生所述设定事件的相关信息,可供定位与所述设定事件相关的网络问题;
    所述数据处理设备,用于获取所述可编程的数据平面提供的事件信息,保存所述事件信息,并面向网络管理员提供查询操作,以供网络管理员定位与所述设定事件相关的网络问题。
PCT/CN2020/083981 2020-02-07 2020-04-09 信息处理方法、设备、系统及存储介质 WO2021155637A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010082309.5 2020-02-07
CN202010082309.5A CN113259143B (zh) 2020-02-07 2020-02-07 信息处理方法、设备、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2021155637A1 true WO2021155637A1 (zh) 2021-08-12

Family

ID=77200709

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083981 WO2021155637A1 (zh) 2020-02-07 2020-04-09 信息处理方法、设备、系统及存储介质

Country Status (2)

Country Link
CN (1) CN113259143B (zh)
WO (1) WO2021155637A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113645100A (zh) * 2021-08-13 2021-11-12 福建天泉教育科技有限公司 一种基于元数据标签的全链路压力测试方案及系统
CN113740748A (zh) * 2021-09-03 2021-12-03 深圳市新威尔电子有限公司 基于can总线发送报文的电池检测方法
CN114389972A (zh) * 2022-02-22 2022-04-22 清华大学 一种丢包检测方法及装置、存储介质
CN115277504A (zh) * 2022-07-11 2022-11-01 京东科技信息技术有限公司 一种网络流量监控方法、装置和系统
CN115955419A (zh) * 2023-03-08 2023-04-11 湖南磐云数据有限公司 数据中心带宽流量主动告警及异常流量监控系统
CN113740748B (zh) * 2021-09-03 2024-04-26 深圳市新威尔电子有限公司 基于can总线发送报文的电池检测方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114189426B (zh) * 2021-10-29 2023-08-11 苏州浪潮智能科技有限公司 代理服务自适应带配置回复方法、系统、装置及存储介质
CN117041272B (zh) * 2023-10-07 2024-01-30 腾讯科技(深圳)有限公司 数据处理方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030016628A1 (en) * 2001-07-23 2003-01-23 Broadcom Corporation Flow based congestion control
CN106487572A (zh) * 2015-09-02 2017-03-08 中兴通讯股份有限公司 报文的处理方法及装置
CN108471389A (zh) * 2018-03-12 2018-08-31 电子科技大学 一种基于服务功能链的交换机系统
CN110661716A (zh) * 2019-09-16 2020-01-07 锐捷网络股份有限公司 网络丢包的通知方法、监控装置、交换机和存储介质
CN110708248A (zh) * 2014-06-26 2020-01-17 华为技术有限公司 软件定义网络的服务质量控制方法及设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105812179B (zh) * 2016-03-09 2019-02-15 中国科学院信息工程研究所 一种协议无关转发网络事件处理方法
DE112019001214T5 (de) * 2018-03-08 2020-11-19 Barefoot Networks, Inc. Erzeugung von Pfadausfallmeldung an Weiterleitungselement
CN108768714A (zh) * 2018-05-22 2018-11-06 郑州云海信息技术有限公司 一种数据中心综合管理系统及其网络安全实现方法
CN109495311B (zh) * 2018-11-30 2022-05-20 锐捷网络股份有限公司 一种网络故障检测方法及装置
CN109787833B (zh) * 2019-01-23 2020-05-08 清华大学 网络异常事件感知方法和系统
CN110493140A (zh) * 2019-08-26 2019-11-22 中国人民解放军国防科技大学 信息网络系统中链路事件的感知方法及其运行系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030016628A1 (en) * 2001-07-23 2003-01-23 Broadcom Corporation Flow based congestion control
CN110708248A (zh) * 2014-06-26 2020-01-17 华为技术有限公司 软件定义网络的服务质量控制方法及设备
CN106487572A (zh) * 2015-09-02 2017-03-08 中兴通讯股份有限公司 报文的处理方法及装置
CN108471389A (zh) * 2018-03-12 2018-08-31 电子科技大学 一种基于服务功能链的交换机系统
CN110661716A (zh) * 2019-09-16 2020-01-07 锐捷网络股份有限公司 网络丢包的通知方法、监控装置、交换机和存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113645100A (zh) * 2021-08-13 2021-11-12 福建天泉教育科技有限公司 一种基于元数据标签的全链路压力测试方案及系统
CN113740748A (zh) * 2021-09-03 2021-12-03 深圳市新威尔电子有限公司 基于can总线发送报文的电池检测方法
CN113740748B (zh) * 2021-09-03 2024-04-26 深圳市新威尔电子有限公司 基于can总线发送报文的电池检测方法
CN114389972A (zh) * 2022-02-22 2022-04-22 清华大学 一种丢包检测方法及装置、存储介质
CN114389972B (zh) * 2022-02-22 2024-03-26 清华大学 一种丢包检测方法及装置、存储介质
CN115277504A (zh) * 2022-07-11 2022-11-01 京东科技信息技术有限公司 一种网络流量监控方法、装置和系统
CN115277504B (zh) * 2022-07-11 2024-04-05 京东科技信息技术有限公司 一种网络流量监控方法、装置和系统
CN115955419A (zh) * 2023-03-08 2023-04-11 湖南磐云数据有限公司 数据中心带宽流量主动告警及异常流量监控系统

Also Published As

Publication number Publication date
CN113259143A (zh) 2021-08-13
CN113259143B (zh) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2021155637A1 (zh) 信息处理方法、设备、系统及存储介质
US11044204B1 (en) Visibility packets with inflated latency
US8619579B1 (en) De-duplicating of packets in flows at layer 3
US9071529B2 (en) Method and apparatus for accelerating forwarding in software-defined networks
US9065745B2 (en) Network traffic distribution
US10999200B2 (en) Offline, intelligent load balancing of SCTP traffic
WO2018121068A1 (zh) 确定传输路径的方法和装置
US9717011B2 (en) Event management in telecommunications networks
US10764209B2 (en) Providing a snapshot of buffer content in a network element using egress mirroring
US9515919B2 (en) Method and apparatus for protection switching in packet transport system
CN110557342B (zh) 用于分析和减轻丢弃的分组的设备
US9350631B2 (en) Identifying flows causing undesirable network events
WO2014000399A1 (zh) 链路选择方法和装置
CN103281257A (zh) 一种协议报文处理方法和设备
US20160248652A1 (en) System and method for classifying and managing applications over compressed or encrypted traffic
WO2023226633A1 (zh) 故障处理方法、相关设备和系统
WO2022152230A1 (zh) 信息流识别方法、网络芯片及网络设备
US11206176B2 (en) Preventing failure processing delay
US11218394B1 (en) Dynamic modifications to directional capacity of networking device interfaces
WO2017088489A1 (zh) 一种数据报文传输方法、系统及通信系统
US20240146655A1 (en) Telemetry-based congestion source detection
Dong et al. An Enhanced Data Plane for Network Event Processing in Software Defined Networking
CN114826957A (zh) 一种应用于无损通信网络冗余报文检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918016

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918016

Country of ref document: EP

Kind code of ref document: A1