WO2021238263A1 - 一种链路检测方法及系统 - Google Patents

一种链路检测方法及系统 Download PDF

Info

Publication number
WO2021238263A1
WO2021238263A1 PCT/CN2021/073447 CN2021073447W WO2021238263A1 WO 2021238263 A1 WO2021238263 A1 WO 2021238263A1 CN 2021073447 W CN2021073447 W CN 2021073447W WO 2021238263 A1 WO2021238263 A1 WO 2021238263A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
program
service interface
side program
interface
Prior art date
Application number
PCT/CN2021/073447
Other languages
English (en)
French (fr)
Inventor
李奇
张连聘
侯绍铮
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US17/927,344 priority Critical patent/US11792098B2/en
Publication of WO2021238263A1 publication Critical patent/WO2021238263A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors
    • H04L49/555Error detection

Definitions

  • the present invention relates to the technical field of network links, and more specifically, to a link detection method and system.
  • the switch which runs in the server cabinet, is a switch used to connect the server in the cabinet to the upper network in the data center).
  • the network management switch is used to connect the network management interfaces of all equipment in the cabinet to remotely control and manage the equipment in the cabinet;
  • the TOR switch is used to connect the service interfaces of all servers in the cabinet to transmit service data between the servers and the upper-level network.
  • the link (specifically, the network link) connecting the TOR switch to the server may fail, which may directly cause the interruption of server services and directly cause economic losses.
  • the purpose of the present invention is to provide a link detection method and system, which provide required information for fault detection and diagnosis of the link connected to the TOR switch and the server through out-of-band communication, so as to realize effective monitoring of the corresponding link , Indirectly improves the reliability of the corresponding link, and to a certain extent avoids the occurrence of server service interruption caused by the link.
  • a link detection method includes:
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server; the server-side program runs on any device, and the client-side program runs on other devices except for any device ,
  • the device includes the TOR switch and the server;
  • the server-side program periodically sends query messages to each of the client programs through the network management switch; each of the client programs responds to the query message and connects the service interfaces included in the device to which other devices are located The interface information of the service interface of is returned to the server-side program through the network management switch;
  • the server-side program reports the interface information of the service interface that has a connection relationship with the service interface of other devices and the received interface information included in the device where it is located to the upper program for the upper program to analyze and implement based on the interface information The connection status of the link corresponding to the connection relationship.
  • it also includes:
  • the server-side program regularly exchanges keep-alive messages with each of the client programs through the network management switch. If the keep-alive message sent by any client program is not received within the specified time, the server-side program determines The connection with the arbitrary client program that can realize information communication through the network management switch is lost.
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server, including:
  • the server-side program sequentially closes each service interface included in the TOR switch. After closing any service interface of the TOR switch, the server-side program queries the service interface of any server that has lost connection through the network management switch, and It is determined that there is a connection relationship between any service interface of the closed TOR switch and the service interface of any server that has lost connection.
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server, including:
  • the server-side program obtains a preset connection relationship table, and obtains the connection relationship between the service interface of the TOR switch and the service interface of each server from the connection relationship table.
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server, including:
  • the server-side program queries each client program from each client program through the network management switch for the MAC address of the service interface of the device where each client program is located, and establishes the MAC address of the queried service interface of each server and the service interface of the TOR switch
  • the corresponding relationship of the MAC address realizes the establishment of the connection relationship between the service interface of the TOR switch and the service interface of each server.
  • it also includes:
  • the server-side program reports the changed interface information to the upper-level program. If any server runs the client program, Then, the client program running on any server sends the changed interface information to the server-side program, and the server-side program reports the changed interface information to the upper-level program.
  • it also includes:
  • the server-side program When communication occurs between the server and the TOP switch, the server-side program records the communication information corresponding to the communication in the communication log.
  • each of the devices includes the server-side program and the client-side program at the same time, and only the server-side program included in one device can run normally at the same time, except for the server-side program included. All devices other than the normally operating device can normally run the client program included in it.
  • a link detection system includes a server-side program and a plurality of client-side programs.
  • the server-side program runs on any device, and the client-side program runs on any device other than the device.
  • the equipment includes TOR switches and servers; among them:
  • the server-side program is used to: obtain the connection relationship between the service interface of the TOR switch and the service interface of each server; periodically send query messages to each of the client programs through the network management switch;
  • the interface information of the service interface with the connection relationship and the received interface information of the service interface of the device are reported to the upper-layer program, so that the upper-layer program analyzes and realizes the connection status of the link of the corresponding connection relationship based on the interface information;
  • Each of the client programs is used to respond to the query message and return the interface information of the service interface included in the device that has a connection relationship with the service interface of other devices to the server through the network management switch program.
  • the present invention provides a link detection method and system.
  • the method includes: a server-side program acquires the connection relationship between the service interface of a TOR switch and the service interface of each server; wherein, the server-side program runs on any device
  • the client program runs on other devices except for any one of the devices, and the device includes the TOR switch and the server; the server-side program periodically sends query messages to each device through the network management switch.
  • each of the client programs responds to the query message and returns the interface information of the service interface included in the device that has a connection relationship with the service interface of other devices to the server through the network management switch End program;
  • the server-side program reports the interface information of the business interface that has a connection relationship with the business interface of other devices and the received interface information included in the device where it is located to the upper program, so that the upper program can be based on the interface Information analysis realizes the connection status of the link corresponding to the connection relationship.
  • This application realizes the information communication between the server-side program and the client-side program through the network management switch.
  • the server-side program and the client-side program run in the corresponding TOR switch and/or server respectively, so as to obtain the connection with the TOR switch in this way
  • the interface information of the service interface of each server is used by the upper-layer program to monitor the connection of the corresponding link based on the interface information, so that the upper-layer program can realize the corresponding link in time when it monitors that a link connection is abnormal.
  • this application realizes the information communication between the TOR switch and each server through the network management switch, so that even if the link directly connected between the TOR switch and the server fails, it can still pass the network management
  • the switch realizes the knowledge of the connection of the link between the TOR switch and the server, and provides the required information for the fault detection and diagnosis of the corresponding link through this out-of-band communication method, and realizes the effective monitoring of the corresponding link , Which indirectly improves the reliability of the corresponding link, and to a certain extent avoids the occurrence of server service interruption caused by the link.
  • FIG. 1 is a flowchart of a link detection method provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a program environment in a link detection method provided by an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a link detection device provided by an embodiment of the present invention.
  • FIG. 1 shows a flowchart of a link detection method provided by an embodiment of the present invention, which may include:
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server; the server-side program runs on any device, and the client-side program runs on any device other than the device.
  • the device Including TOR switches and servers.
  • the client program and the server program may specifically run on the TOR switch in the entire cabinet and the corresponding BMC (Baseboard Management Controller) chip of each server; the number of servers is 1
  • the program environment of the client program and the server program in the embodiment of the present application can be shown in Figure 2.
  • the business network interface is the business interface, and the BMC network management interface can be referred to as the network management interface for short;
  • the server can use the network management interface It can communicate with the network management switch, or it can communicate with the TOR switch through the service interface, which corresponds to the service interface of the server.
  • the TOR switch includes the service interface connected to the service interface of the server, so as to connect the service interfaces of the two devices.
  • a link is formed for the two devices to communicate.
  • This application is designed to detect the connection between the server and the TOR switch through the service interface; the server-side program obtains the connection between the service interface of the TOR switch and the service interface of each server.
  • the relationship specifically, is to obtain each service interface of the TOR switch and which service interface of which server is connected, so that through the connection relationship, it can be learned that the link between the TOR switch and any server belongs to the TOR switch and the arbitrary server.
  • the service interface that is, can locate the interconnected service interface of two kinds of devices.
  • a server-side program and a client program that can realize communication can be set.
  • the server-side program can communicate with each client program through the network management network provided by the network management switch, that is, it can communicate with each other through the network management network provided by the network management switch.
  • Communication is realized in out-of-band mode; generally, only one device can run the server-side program in the TOR switch and the server connected to the TOR switch through the service interface. At this time, other devices need to run the client-side program to pass the server-side
  • the communication between the program and the client program realizes the monitoring of each service interface included in the server and the TOR switch.
  • the server-side program periodically sends a query message to each client program through the network management switch; each client program responds to the query message, and includes the interface information of the service interface that has a connection relationship with the service interface of other equipment included in the device. , Through the network management switch to return to the server-side program.
  • the server-side program can communicate with each client program regularly through the network management network provided by the network management switch to obtain the service of the device where each client program is located. Interface status information of the interface (referred to as interface information); specifically, the server-side program can periodically send communication messages to each client program as query messages.
  • the query messages can be in TCP (Transmission Control Protocol, Transmission Control Protocol).
  • UDP User Datagram Protocol
  • User Datagram Protocol User Datagram Protocol
  • sub-protocol communication information such as the length of the protocol message content, interface number, interface MAC (Media Access Control Address, media access control) Address) address, flag bit (query flag bit, control flag bit or feedback flag bit), instruction information (query control instruction or feedback information).
  • the information queried in this application is the client program that received the query message
  • the server-side program can send communication messages to each client program as a control message.
  • the client program After the client program receives a control message with a control flag, it can follow the control Message instructions to perform corresponding control actions, including but not limited to modifying the corresponding interface register, changing the interface speed, changing the interface working mode (whether the simplex/duplex/energy saving mode is turned on, etc.), etc.; and after the control action is implemented, the It can send feedback information with whether the operation is successful or not to the server-side program, and the purpose of sending the feedback information is for the upper-layer program to execute the correct strategy. For example, if the interface link layer connection is disconnected, it can be judged according to the received optical power. Able to receive the signal, if so, you can try to modify the link layer working mode to achieve the ability to automatically adapt.
  • the server-side program reports the interface information and received interface information of the business interface that has a connection relationship with the business interface of other devices included in its own device to the upper-layer program, so that the upper-layer program can realize the corresponding connection relationship based on the analysis of the interface information.
  • Link connection status The server-side program reports the interface information and received interface information of the business interface that has a connection relationship with the business interface of other devices included in its own device to the upper-layer program, so that the upper-layer program can realize the corresponding connection relationship based on the analysis of the interface information. Link connection status.
  • the server After the server has collected the interface information of the business interface corresponding to the device where each client program is located, it also needs to obtain the interface information of the business interface that it includes and has a connection relationship with the business interface of other devices, and then collects its own interface information and The interface information sent by other client programs are reported to the upper-layer program, so that the upper-layer program can determine whether the link between each pair of connected service interfaces is in a normal connection state based on all the interface information, so that when the link is broken In case of abnormal situations such as opening the connection, perform corresponding repairs and other treatments in time.
  • This application realizes the information communication between the server-side program and the client-side program through the network management switch.
  • the server-side program and the client-side program run in the corresponding TOR switch and/or server respectively, so as to obtain the connection with the TOR switch in this way
  • the interface information of the service interface of each server is used by the upper-layer program to monitor the connection of the corresponding link based on the interface information, so that the upper-layer program can realize the corresponding link in time when it monitors that a link connection is abnormal.
  • this application realizes the information communication between the TOR switch and each server through the network management switch, so that even if the link directly connected between the TOR switch and the server fails, it can still pass the network management
  • the switch realizes the knowledge of the connection of the link between the TOR switch and the server, and provides the required information for the fault detection and diagnosis of the corresponding link through this out-of-band communication method, and realizes the effective monitoring of the corresponding link , Which indirectly improves the reliability of the corresponding link, and to a certain extent avoids the occurrence of server service interruption caused by the link.
  • the server-side program regularly exchanges keep-alive messages with each client program through the network management switch. If the keep-alive message sent by any client program is not received within the specified time, the server-side program determines that the loss is related to the any client program. It can realize the connection of information communication through the network management switch.
  • the server-side program can also periodically exchange keep-alive messages with each client program through the network management network provided by the network management switch. Specifically, the server program periodically sends a keep-alive message to each client program through the network management switch, and each client program returns a keep-alive message to the server program after receiving the keep-alive message, and judges in this way Whether the server-side program and the corresponding client program are in a state where they can communicate normally through the network management switch; if the server-side program sends a keep-alive message within the preset time period (the specified time can be set according to actual needs), it is not received To the keep-alive message returned by any client program, it is considered that the connection between the arbitrary client program and the server-side program is lost, and the communication between the arbitrary client program and the server-side program can no longer be realized, so that the client program and the server-side program can be communicated in
  • the situation can also be reported to the upper program to achieve an abnormal situation Report in a timely manner.
  • the server-side program can read the MAC table saved in the network management switch, from the MAC table Get the MAC address of the device connected to all interfaces except the uplink port, that is, the MAC address of the device where all client programs are located, so as to access the client program running on each device through the MAC address, and the client program Keep-alive messages are continuously exchanged according to the pre-set time interval (regularly) according to actual needs; when the server-side program is continuously n times (can be set according to actual needs, such as 2 times, 3 times, etc.), it is sent by the client program When the keep-alive message is considered, the communication connection with the client program is established.
  • the time interval can be called a cycle
  • the preset time period can be 2 times the cycle
  • the keep-alive message can specifically include the priority information and cycle information of the server-side program or the client-side program where the protection message is sent.
  • the server-side program After the server-side program receives the keep-alive message sent by the client, it will determine whether the keep-alive message is correct.
  • the keep-alive message sent by the client program, and judging whether the keep-alive message is correct can be implemented according to any rules set in actual needs, such as judging whether the information contained in the keep-alive message is specified information (such as priority information and Period information, etc.), there is no specific limitation here.
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server, which may include:
  • the server-side program sequentially closes each service interface included in the TOR switch. After closing any service interface of the TOR switch, the server-side program queries the service interface of any server that has lost the connection through the network management switch, and determines whether any service interface of the closed TOR switch is connected with There is a connection relationship between the service interfaces of any server that has lost the connection.
  • each service interface contained in the TOR switch can be controlled to close in turn, and after any service interface is closed, the network management switch is used to query whether each server exists If there is a business interface that has lost connection, it means that there is a connection relationship between the business interface and the currently closed business interface, so that the connection relationship can be accurately determined with simple operations in this way.
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server, including:
  • the server-side program obtains the preset connection relationship table, and obtains the connection relationship between the service interface of the TOR switch and the service interface of each server from the connection relationship table.
  • the corresponding relationship between the service interfaces running on the TOR switch and the server can be defined in advance, and then the connection between the service interfaces is realized according to the corresponding relationship (the MAC address of each service interface of the server is set according to the corresponding relationship), and the above is included
  • the connection relationship table of the corresponding relationship can be stored in the network management network or a location accessible by other server-side programs, so that the server-side program can access the connection relationship table to know the relationship between the service interface of the TOR switch and the service interface of each server. Connection relationship. In this way, realizing the setting of the above connection relationship can meet current actual needs, and the acquisition of the connection relationship is also simple and easy to implement.
  • the table can also be the above Join the relationship table.
  • the server-side program obtains the connection relationship between the service interface of the TOR switch and the service interface of each server, which may include:
  • the server-side program uses the network management switch to query the MAC address of the service interface of each client program from each client program, and establishes the corresponding relationship between the MAC address of the service interface of each server and the MAC address of the service interface of the TOR switch. , To realize the establishment of the connection relationship between the business interface of the TOR switch and the business interface of each server.
  • the server-side program can send broadcast messages through the network management switch, and obtain the IP address of the device where each client program responds to the broadcast message; the server-side program sends a request address to the corresponding client program based on the IP address of each client program Message, receive the message containing the MAC address of the service interface of the device returned by each client program, so as to obtain the MAC address of the service interface of each device (each service interface has only its own MAC address),
  • the connection relationship between the service interface of the TOR switch and the service interface of the server can be established through the establishment of the correspondence between the MAC addresses, that is, there is a corresponding relationship
  • the pair of MAC addresses belonging to the service interfaces of the TOR switch and the server has a corresponding relationship, and the service interfaces corresponding to the pair of MAC addresses have a connection relationship, so as to facilitate and accurately determine the connection relationship.
  • the server-side program reports the changed interface information to the upper program. If the server runs a client program, the server-side program reports the changed interface information to the upper-level program. The running client program sends the changed interface information to the server-side program, and the server-side program reports the changed interface information to the upper program.
  • the server-side program will actively report the changed interface information to the upper program; and, if If the arbitrary server is running a server-side program, the server-side program can directly report the changed interface information. If the arbitrary server is running a client program, the client program will send the changed interface information to The server-side program reports the changed interface information by the server-side program.
  • the server-side program When the server-side program communicates between the server and the TOP switch, it records the corresponding communication information of the communication to the communication log.
  • the server-side program can also monitor the communication between the server and the TOR switch in real time, and then record the corresponding communication information in the communication log for later query when needed; among them, the communication between the server and the TOR switch can include through the network management
  • the communication generated by the switch may also include the communication generated through the link between the service interfaces, which is within the protection scope of the present invention.
  • each device includes both a server-side program and a client-side program.
  • only one device includes a server-side program that can run normally, except for the included server-side program. All devices other than the normally operating device can normally run the client program included in it.
  • the operation or normal operation mentioned in this application can be in the active state, that is, only one server-side program included in the device can be in the active state at the same time, and the server-side program is run to realize the corresponding function.
  • Other devices need to run a client program to realize the communication between the TOR switch and the server through the communication between the server-side program and the client program.
  • the reason why a server-side program and a client-side program are set in each device is that if a device fails to activate the server-side program that it includes, other devices can activate the server-side program that it includes. The way to realize the backup of the server-side program makes the realization of link detection more reliable.
  • the device that activates the server-side program can be determined according to the priority of each device, and the TOR switch can be set to have the highest priority. Therefore, the server-side program included in the TOR switch is usually activated first. If the TOR switch is activated If the server-side program included in the above succeeds, there is no need to activate the server-side program included in other devices. If the activation of the server-side program included on the TOR switch fails, the server-side program included in the server with the highest priority in the server will be activated. If the activation is successful, there is no need to activate the server-side programs included on other devices; otherwise, the server-side programs included on the server with the second highest priority in the server are activated, and so on.
  • the device where all the inactive server-side programs are located recognizes the device that should activate the server-side program, it can be any device that periodically detects whether there is a keep-alive message on the network management network (the same as the above-mentioned keep-alive message).
  • the text includes the priority information of the device where the currently activated server-side program is located. If the priority of the device where the currently-activated server-side program is located is lower than the priority of the arbitrary device, the arbitrary device determines that it has a higher priority , So activate the server-side program it contains, otherwise, determine that its own priority is lower, and there is no need to activate the server-side program it contains.
  • the implementation of the link detection method provided by the embodiment of the present invention may specifically include:
  • the server-side program After the server-side program is activated, read the MAC table of the network management switch to obtain the MAC address of the device connected to all interfaces except the uplink port, and access the client program running on the BMC of each device through the MAC address.
  • the pre-defined time interval continuously exchanges keep-alive messages; when the server-side program receives the correct keep-alive messages twice in a row, the connection is considered to be established, and the time exceeds twice the pre-defined keep-alive message sending cycle If the keep-alive message is not received within, it is considered that the connection is lost.
  • the content of the keep-alive message may include the priority information of the device where it is located, the period of the exchange of the keep-alive message, and so on.
  • connection relationship can be recorded in the connection relationship table ,
  • the connection record table can record the number of the service interface of the TOR switch, the BMC IP address of the server to which the service interface is connected, the number of the service interface connected to the service interface on the server, and the MAC of the service interface connected to the service interface on the server Address, etc.;
  • the connection relationship can also be compared by querying the MAC address of the service interface of the server with the MAC table of the TOR switch to identify the connection relationship between the service interface of the server accessed by different IPs and the service interface of the TOR switch;
  • the connection relationship can also be to close each service interface of the TOR switch, read the host information (existing in the network management switch) through the management channel, search for the service interface of the server that is also disconnected, and finally determine the service interface of the switch and the server.
  • the server-side program queries the interface information of the service interface of the device where all the client programs that have established connections are located through the communication message according to the pre-set cycle, and can send a control instruction message through the communication message. In addition to regular query, when the server's service interface status changes, it will also actively report to the upper program.
  • the query cycle can be a cycle set by time, or it can be triggered by an interrupt event, and transmit immediately without waiting; where the interrupt event generated includes, but is not limited to, changes in interface information.
  • the server-side program records all communication logs.
  • the links in this application include but are not limited to Ethernet links, Fibre Channel, InfiniBand, and so on.
  • the above-mentioned technical solutions disclosed in this application can be used for link information exchange and link control to provide out-of-band link negotiation and fault diagnosis functions; and can identify the service interface of the TOR switch in the entire cabinet and the service interface of the server
  • the connection relationship allows the TOR switch to know the BMC network management IP address of the server connected to each service interface, thereby improving the reliability of the system and avoiding the traditional AN, LLDP and other protocols from working in-band when the link fails. Disadvantages of unavailability.
  • functions such as link connection mode negotiation, status synchronization, automatic energy saving, flow control, and link fault diagnosis can also be realized, thereby improving the link between the server and the TOR switch in the data center. Reliability, improve diagnosis efficiency, reduce failure recovery time, and create objective economic benefits.
  • the embodiment of the present invention also provides a link detection system, as shown in FIG. 3, including a server-side program 11 and a plurality of client-side programs 12 (in FIG. 3, the number of client programs is 3 as an example for illustration, respectively Denoted as 121, 122, and 123), the server-side program 11 runs on any device, and the client-side program 12 runs on other devices except for any device.
  • the devices include TOR switches and servers; among them:
  • the server-side program 11 is used to: obtain the connection relationship between the service interface of the TOR switch and the service interface of each server; periodically send query messages to each client program 12 through the network management switch; The interface information and the received interface information of the business interface with the connection relationship of the business interface are reported to the upper layer program for the upper layer program to analyze the connection status of the link of the corresponding connection relationship based on the interface information;
  • Each client program 12 is used to respond to the query message, and return the interface information of the service interface included in the device that has a connection relationship with the service interface of other devices to the server program 11 through the network management switch.

Abstract

一种链路检测方法及系统,该方法包括:服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;服务器端程序通过网管交换机定期发送查询报文至每个客户端程序;每个客户端程序响应查询报文,将所在TOR交换机或服务器包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过网管交换机返回给服务器端程序;服务器端程序将自身所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供上层程序基于接口信息分析实现对应连接关系的链路的连接情况。从而通过带外通信的方式实现对相应链路的有效监控,进而间接提升了相应链路的可靠性。

Description

一种链路检测方法及系统
本申请要求于2020年5月29日提交中国专利局、申请号为202010476918.9、发明名称为“一种链路检测方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及网络链路技术领域,更具体地说,涉及一种链路检测方法及系统。
背景技术
随着经济与互联网行业的发展,当今社会对数据中心性能与可靠性要求越来越高,在新建数据中心中整机柜已成为主流,即在一个机柜内拥有若干服务器与网管交换机、机柜顶交换机(TOR(Top of Rack)交换机,运行于服务器机柜内,是用于数据中心内连接机柜内的服务器与上层网络的交换机)。其中,网管交换机用于连接机柜内所有设备的网管接口,以远程控制管理机柜内的设备;TOR交换机用于连接机柜内所有服务器的业务接口,以传输服务器与上级网络之间的业务数据。但是实际应用中,TOR交换机与服务器连接的链路(具体为网络链路)可能会出现故障,进而直接导致服务器业务中断,直接造成经济损失。
发明内容
本发明的目的是提供一种链路检测方法及系统,通过带外通信的方式为TOR交换机与服务器连接的链路的故障检测及诊断等提供所需的信息,实现对相应链路的有效监控,间接提升了相应链路的可靠性,一定程度上避免了因链路原因导致的服务器业务中断的出现。
为了实现上述目的,本发明提供如下技术方案:
一种链路检测方法,包括:
服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之 间的连接关系;所述服务器端程序运行于任一设备上,所述客户端程序运行于除该任一设备外的其他设备上,所述设备包括所述TOR交换机及所述服务器;
所述服务器端程序通过网管交换机定期发送查询报文至每个所述客户端程序;每个所述客户端程序响应所述查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过所述网管交换机返回给所述服务器端程序;
所述服务器端程序将自身所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供所述上层程序基于所述接口信息分析实现对应连接关系的链路的连接情况。
优选的,还包括:
所述服务器端程序通过所述网管交换机定期与各个所述客户端程序交换保活报文,如果未在规定时间内收到任意客户端程序发送的保活报文,则所述服务器端程序确定丢失与该任意客户端程序之间的能够通过所述网管交换机实现信息通信的连接。
优选的,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
所述服务器端程序依次关闭所述TOR交换机包括的各业务接口,在关闭所述TOR交换机的任意业务接口后,所述服务器端程序通过所述网管交换机查询丢失连接的任意服务器的业务接口,并确定关闭的所述TOR交换机的任意业务接口与丢失连接的任意服务器的业务接口之间具有连接关系。
优选的,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
所述服务器端程序获取预设的连接关系表,并由该连接关系表中获取所述TOR交换机的业务接口与各所述服务器的业务接口之间的连接关系。
优选的,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
所述服务器端程序通过所述网管交换机从各客户端程序中查询各客户端程序所在设备的业务接口的MAC地址,建立查询到的各服务器的业务接口的MAC地址与所述TOR交换机的业务接口的MAC地址的对应关系,实现所述TOR交换机的业务接口与各服务器的业务接口之间的连接关系的建立。
优选的,还包括:
当任意服务器的业务接口的接口信息发生变化时,如果该任意服务器运行服务器端程序,则所述服务器端程序将发生变化的接口信息上报给所述上层程序,如果该任意服务器运行客户端程序,则该任意服务器运行的客户端程序将发生变化的接口信息发送给所述服务器端程序,由所述服务器端程序将发生变化的接口信息上报给所述上层程序。
优选的,还包括:
所述服务器端程序在所述服务器与所述TOP交换机之间发生通信时,记录发生的通信对应通信信息至通信日志中。
优选的,每个所述设备均同时包括所述服务器端程序及所述客户端程序,在同一时刻仅有一个设备包括的所述服务器端程序能够正常运行,除包括的所述服务器端程序能够正常运行的设备外的其他设备均能够正常运行自身包括的所述客户端程序。
一种链路检测系统,包括服务器端程序及多个客户端程序,所述服务器端程序运行于任一设备上,所述客户端程序运行于除该任一设备外的其他设备上,所述设备包括TOR交换机及服务器;其中:
所述服务器端程序用于:获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;通过网管交换机定期发送查询报文至每个所述客户端程序;以及将自身包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供所述上层程序基于所述接口信息分析实现对应连接关系的链路的连接情况;
每个所述客户端程序均用于:响应所述查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过所述网管交换机返回给所述服务器端程序。
本发明提供了一种链路检测方法及系统,该方法包括:服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;其中,所述服务器端程序运行于任一设备上,所述客户端程序运行于除该任一设备外的其他设备上,所述设备包括所述TOR交换机及所述服务器;所述服务器端程序通过网管交换机定期发送查询报文至每个所述客户端程序;每个所述客户端程序响应所述查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过所述网管交换机返回给所述服务器端程序;所述服务器端程序将自身所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供所述上层程序基于所述接口信息分析实现对应连接关系的链路的连接情况。本申请通过网管交换机实现服务器端程序及客户端程序之间的信息通信,服务器端程序及客户端程序分别运行于相应的TOR交换机和/或服务器中,从而通过这种方式获取到与TOR交换机连接的每个服务器的业务接口的接口信息,供上层程序基于该接口信息实现对对应链路的连接情况的监控,进而使得上层程序能够在监控到某链路连接出现异常时及时实现相应的链路故障处理、如链路修复等操作,可见,本申请通过网管交换机实现TOR交换机及各服务器之间的信息通信,由此即使TOR交换机与服务器之间直接连接的链路出现故障,也能够通过网管交换机实现对TOR交换机与服务器之间的链路的连接情况的获知,通过这种带外通信的方式为相应链路的故障检测、诊断等提供所需的信息,实现对相应链路的有效监控,进而间接提升了相应链路的可靠性,一定程度上避免了因链路原因导致的服务器业务中断的出现。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例提供的一种链路检测方法的流程图;
图2为本发明实施例提供的一种链路检测方法中程序环境示意图;
图3为本发明实施例提供的一种链路检测装置的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参阅图1,其示出了本发明实施例提供的一种链路检测方法的流程图,可以包括:
S11:服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;服务器端程序运行于任一设备上,客户端程序运行于除该任一设备外的其他设备上,设备包括TOR交换机及服务器。
本发明实施例中客户端程序及服务器端程序具体可以是运行在整机柜内TOR交换机及对应的每台服务器的BMC(Baseboard Management Controller,底板管理控制器)芯片上;以服务器的数量为1为例进行示意,本申请实施例客户端程序及服务器端程序的程序环境可以如图2所示,其中,业务网络接口则为业务接口,BMC网管接口可简称为网管接口;服务器可以通过网管接口与网管交换机实现通信,也可以通过业务接口与TOR交换机实现通信,与服务器的业务接口相对应,TOR交换机中包括与服务器的业务接口连接的业务接口,从而通过两种设备的业务接口的连接,形成供两种设备进行通信的链路,本申请则是为了检测服务器通过业务接口与TOR交换机实现连接的链路设计;服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,具体则是获取TOR交换机的每个业务接口分别与哪个服务器的哪个业务接口连接,从而通过该连接关系能够获知形成TOR交换机与任意服务器的链路的、分别属于TOR交换机及该任意服务器的业务接口,也即能够定位到分别属于两种设备的互相连 接的业务接口。
本申请实施例中可以设置能够实现通信的服务器端程序及客户端程序,具体来说,服务器端程序能够与每个客户端程序分别通过网管交换机提供的网管网络实现通信,也即其通过这种带外的方式实现通信;一般在TOR交换机及与该TOR交换机通过业务接口连接的服务器中,仅能够有一台设备可运行服务器端程序,此时其他设备则需要运行客户端程序,从而通过服务器端程序及客户端程序的通信,实现对服务器及TOR交换机包含的各业务接口的监控。
S12:服务器端程序通过网管交换机定期发送查询报文至每个客户端程序;每个客户端程序响应查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过网管交换机返回给服务器端程序。
在确定分别属于TOR交换机及服务器的业务接口之间的连接关系后,服务器端程序则可以通过网管交换机提供的网管网络定期与各客户端程序进行通信,以获取到各客户端程序所在设备的业务接口的接口状态信息(简称为接口信息);具体来说,服务器端程序可以定期向各个客户端程序发送通信报文作为查询报文,查询报文可以是在TCP(Transmission Control Protocol,传输控制协议)报文或UDP(User Datagram Protocol,用户数据报协议)报文的基础上封装分协议通信信息,如可以包括协议报文内容长度、接口编号、接口MAC(Media Access Control Address,媒体存取控制位址)地址、标志位(查询标志位、控制标志位或反馈标志位)、指令信息(查询控制指令或反馈信息)。客户端程序在接收到带有查询标志位查询报文后,将所查询的信息携带于通信报文中反馈给服务器端程序,本申请中查询的信息则为接收到查询报文的客户端程序所在设备上包括的、与其他设备的业务接口之间具有连接关系的业务接口的接口信息;接口信息为表示对应的业务接口的状态的信息,进而可以基于这些信息确定对应链路的状态是否正常,也即对应链路处于连接状态还是断开状态;如业务接口为以太接口时,接口信息可以包括物理收发器类型、支持的速率、媒介类型等,进一步的,当物理收发器的类型为光模块时,接口信息可以包括接收 光功率、发送光功率、温度、CDR是否已经锁定等信息,上述信息均可以用于确定链路的状态。其中,基于表示接口状态的信息确定接口对应链路的状态与现有技术中对应方案的实现原理一致,在此不再赘述。
另外,如果服务器端程序需要实现相应的控制操作,则可以向各客户端程序发送通信报文作为控制报文,客户端程序在收到带有控制标志位的控制报文后,则可以按照控制报文的指示进行相应的控制动作,包括但不限于修改相应接口寄存器、更改接口速率、更改接口工作模式(单工/双工/节能模式是否打开等)等;并且在实现控制动作后,还可以发送带有操作是否成功的反馈信息给服务器端程序,而发送反馈信息的目的则是供上层程序执行正确的策略,例如检测到接口链路层连接断开,则可以根据接收光功率判断是否能够接收到信号,如果是则可以尝试修改链路层工作模式,以实现自动适应的能力。
S13:服务器端程序将自身所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供上层程序基于接口信息分析实现对应连接关系的链路的连接情况。
服务器端在收集到各个客户端程序所在设备对应业务接口的接口信息后,还需要获取自身包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,进而将自身的接口信息及收集到的其他客户端程序发送的接口信息均上报给上层程序,以供上层程序基于全部接口信息确定每对具有连接关系的业务接口之间的链路是否处于正常连接的状态,以在链路出现断开连接等异常情况时及时进行相应的修复等处理。
本申请通过网管交换机实现服务器端程序及客户端程序之间的信息通信,服务器端程序及客户端程序分别运行于相应的TOR交换机和/或服务器中,从而通过这种方式获取到与TOR交换机连接的每个服务器的业务接口的接口信息,供上层程序基于该接口信息实现对对应链路的连接情况的监控,进而使得上层程序能够在监控到某链路连接出现异常时及时实现相应的链路故障处理、如链路修复等操作,可见,本申请通过网管交换机实现TOR交换机及各服务器之间的信息通信,由此即使TOR交换机与服务器之间直接连接的链路出现故障,也能够通过网管交换机实现对TOR交换机与 服务器之间的链路的连接情况的获知,通过这种带外通信的方式为相应链路的故障检测、诊断等提供所需的信息,实现对相应链路的有效监控,进而间接提升了相应链路的可靠性,一定程度上避免了因链路原因导致的服务器业务中断的出现。
本发明实施例提供的一种链路检测方法,还可以包括:
服务器端程序通过网管交换机定期与各个客户端程序交换保活报文,如果未在规定时间内收到任意客户端程序发送的保活报文,则服务器端程序确定丢失与该任意客户端程序之间的能够通过网管交换机实现信息通信的连接。
需要说明的是,在服务器端程序定期获取各客户端程序所在设备的业务接口的接口信息的同时,服务器端程序还可以通过网管交换机提供的网管网络定期与各客户端程序交换保活报文,具体来说,服务端程序通过网管交换机定期向各客户端程序发送保活报文,各客户端程序接收到保活报文后会向服务器端程序返回保活报文,以这种方式来判断服务器端程序与对应的客户端程序是否处于能够通过网管交换机正常通信的状态;如果在服务器端程序发出保活报文后的预设时间段内(规定时间,可以根据实际需要设定)没有接收到任意客户端程序返回的保活报文,则认为该任意客户端程序与服务器端程序连接丢失,无法再继续实现该任意客户端程序与服务器端程序通信,从而能够实时实现对客户端程序及服务器端程序之间连接的监控,进而在监控到连接丢失时便于及时进行连接修复等操作。
为了进一步便于对客户端程序及服务器端程序之间连接情况的获知,在监控到任意客户端程序与服务器端程序之间的连接丢失后,还可以将此情况上报至上层程序,从而实现异常情况的及时上报。
另外,在服务器端程序及客户端程序未开始进行任何通信前,为了实现服务器端程序及客户端程序之间通信连接的建立,服务器端程序可以读取网管交换机中保存的MAC表,从MAC表中获取除上联口之外所有的接口所连接设备的MAC地址,也即所有客户端程序所在设备的MAC地址,从而通过MAC地址访问每台设备上所运行的客户端程序,与客户端程序按照预先根据实际需要设定的时间间隔(定期)不断交换保活报文;当服务器端 程序连续n次(可以根据实际需要进行设定,如2次、3次等)接收到客户端程序发送的保活报文时则认为与客户端程序之间的通信连接建立。其中,时间间隔可以称为周期,预设时间段可以为2倍周期,而保活报文具体可以包括发送保护报文的服务器端程序或者客户端程序所在设备的优先级的信息及周期的信息。并且,在服务器端程序接收客户端发送的保活报文后还会判断保活报文是否正确,如果正确,则确定接收到了客户端程序发送的保活报文,否则,认为并未接收到客户端程序发送的保活报文,而判断保活报文是否正确可以根据实际需要设定的任意规则实现,如判断保活报文包含的信息是否为指定的信息(如优先级的信息及周期的信息等),在此不作具体限定。
本发明实施例提供的一种链路检测方法,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,可以包括:
服务器端程序依次关闭TOR交换机包括的各业务接口,在关闭TOR交换机的任意业务接口后,服务器端程序通过网管交换机查询丢失连接的任意服务器的业务接口,并确定关闭的TOR交换机的任意业务接口与丢失连接的任意服务器的业务接口之间具有连接关系。
为了获取到分别运行于TOR交换机及服务器的业务接口之间的连接关系,可以控制依次关闭TOR交换机包含的各个业务接口,并且在任意的一个业务接口关闭后,通过网管交换机查询各服务器中是否存在丢失连接的业务接口,如果存在,则说明该业务接口与当前关闭的业务接口之间具有连接关系,从而通过这种方式能够以简单的操作实现连接关系的准确确定。
本发明实施例提供的一种链路检测方法,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
服务器端程序获取预设的连接关系表,并由该连接关系表中获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系。
可以预先定义好分别运行于TOR交换机与服务器的业务接口之间的对应关系,进而按照该对应关系实现业务接口之间的连接(按照该对应关系设置服务器各业务接口的MAC地址),并且包含上述对应关系的连接关系表可以存储于网管网络或者其他服务器端程序可以访问的位置中,从而使 得服务器端程序访问该连接关系表即可获知到TOR交换机的业务接口与各服务器的业务接口之间的连接关系,通过这种方式实现上述连接关系的设定能够满足当前实际需求,且连接关系的获取也简单易实现。
另外还可以预先定义好网管交换机包含的各接口与服务器连接对应关系、网管交换机包含的各接口与TOR交换机连接对应关系,也将这些对应关系存储到对应的表中,当然该表也可以是上述连接关系表。
本发明实施例提供的一种链路检测方法,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,可以包括:
服务器端程序通过网管交换机从各客户端程序中查询各客户端程序所在设备的业务接口的MAC地址,建立查询到的各服务器的业务接口的MAC地址与TOR交换机的业务接口的MAC地址的对应关系,实现TOR交换机的业务接口与各服务器的业务接口之间的连接关系的建立。
服务器端程序可以通过网管交换机发送广播报文,并获取各客户端程序响应广播报文返回的所在设备的IP地址;服务器端程序基于各客户端程序的IP地址向对应的客户端程序发送请求地址报文,接收各客户端程序返回的包含所在设备的业务接口的MAC地址的报文,从而获取到各设备的业务接口的MAC地址(每个业务接口都具有仅与自身对应的MAC地址),由此获取到全部服务器及TOR交换机包含的业务接口的MAC地址后,可以通过MAC地址间对应关系的建立,建立TOR交换机的业务接口与服务器的业务接口之间的连接关系,也即具有对应关系的一对分别属于TOR交换机及服务器的业务接口的MAC地址之间具有对应关系,则这对MAC地址对应的业务接口之间则具有连接关系,从而方便准确的实现连接关系的确定。
本发明实施例提供的一种链路检测方法,还可以包括:
当任意服务器的业务接口的接口信息发生变化时,如果该任意服务器运行服务器端程序,则服务器端程序将发生变化的接口信息上报给上层程序,如果该任意服务器运行客户端程序,则该任意服务器运行的客户端程序将发生变化的接口信息发送给服务器端程序,由服务器端程序将发生变化的接口信息上报给上层程序。
为了进一步能够及时获取到各业务接口的接口信息,本申请实施例中 任意服务器的业务接口的接口信息发生变化时,会由服务器端程序主动将变化后的接口信息上报给上层程序;并且,如果该任意服务器运行的为服务器端程序,则直接由服务器端程序将变化后的接口信息上报即可,如果该任意服务器运行的为客户端程序,则该客户端程序将变化后的接口信息发送给服务器端程序,以由服务器端程序将变化后的接口信息上报。
本发明实施例提供的一种链路检测方法,还可以包括:
服务器端程序在服务器与TOP交换机之间发生通信时,记录发生的通信对应通信信息至通信日志中。
服务器端程序还可以实时监控服务器与TOR交换机之间发生的通信,进而将对应的通信信息记录至通信日志中,供后期需要时查询;其中,服务器与TOR交换机之间发生的通信可以包括通过网管交换机发生的通信,也可以包括通过业务接口之间的链路发生的通信,均在本发明的保护范围之内。
本发明实施例提供的一种链路检测方法,每个设备均同时包括服务器端程序及客户端程序,在同一时刻仅有一个设备包括的服务器端程序能够正常运行,除包括的服务器端程序能够正常运行的设备外的其他设备均能够正常运行自身包括的客户端程序。
需要说明的是,本申请中所说的运行或者正常运行均可以是处于激活状态,也即同一时刻仅有一个设备包括的服务器端程序能够处于激活状态,运行服务器端程序以实现相应的功能,而其他设备则需要运行客户端程序,以实现通过服务器端程序与客户端程序之间的通信,实现TOR交换机与服务器之间的通信。另外,之所以在每个设备中均设置服务器端程序及客户端程序,是为了如果某设备激活自身包括的服务器端程序失败,则可以由其他设备激活自身包括的服务器端程序,从而通过这种方式实现服务器端程序的备份,使得链路检测实现具有较高的可靠性。在一种实现方式中,可以按照各设备的优先级确定激活服务器端程序的设备,可以设定TOR交换机具有最高优先级,由此通常先激活TOR交换机上包括的服务器端程序,如果激活TOR交换机上包括的服务器端程序成功,则无需再激活其他设备上包括的服务器端程序,如果激活TOR交换机上包括的服务器端程序失败, 则激活服务器中具有最高优先级的服务器上包含的服务器端程序,如果激活成功,则无需再激活其他设备上包括的服务器端程序,否则,激活服务器中具有次高优先级的服务器上包含的服务器端程序,以此类推。而所有未激活的服务器端程序所在设备识别更当前应激活服务器端程序的设备时,可以是任意设备定期检测网管网络上是否存在保活报文(与上述保活报文相同),保活报文中包括当前激活的服务器端程序所处设备的优先级的信息,如果当前激活的服务器端程序所处设备的优先级低于该任意设备的优先级,则该任意设备确定自身优先级较高,因此激活自身包含的服务器端程序,否则,确定自身优先级较低,无需激活自身包含的服务器端程序。
在一种具体应用场景中,本发明实施例提供的一种链路检测方法的实现具体可以包括:
1、服务器端程序在激活后,读取网管交换机的MAC表,获取除上联口外所有的接口所连接设备的MAC地址,通过MAC地址访问每台设备的BMC上所运行的客户端程序,按事先定义的时间间隔不间断交换保活报文;当服务器端程序连续2次收到正确的保活报文则认为连接建立,而在超过事先定义的保活报文发送的周期2倍的时间内未收到保活报文,则认为连接丢失。其中,保活报文的内容可以包括所在设备的优先级的信息、保活报文交换的周期等。
2、确定TOR交换机业务接口与该业务接口所连接的服务器BMC的IP地址(Internet Protocol Address,互联网协议地址,或者说服务器的业务端口)之间的连接关系;该连接关系可以由连接关系表记录,连接记录表中可以记录TOR交换机业务接口的编号、业务接口所连接的服务器对应BMC IP地址、服务器上与该业务接口连接的业务接口的编号、服务器上与该业务接口连接的业务接口的MAC地址等;该连接关系也可以通过查询到服务器的业务接口的MAC地址与TOR交换机的MAC表进行比对,识别不同IP所访问到的服务器的业务接口与TOR交换机的业务接口的连接关系;该连接关系还可以是关闭TOR交换机的每个业务接口,通过管理通道读取主机信息(存在于网管交换机中),搜索同样断开连接的服务器的业务接口,从而最终确定交换机的业务接口与服务器的业务接口的连接关系。
3、服务器端程序按事先设定的周期通过通信报文查询所有已建立连接的客户端程序所在设备的业务接口的接口信息,并可以通过通信报文发送控制指令报文。除定期查询外,在服务器的业务接口状态发生变化时,还会主动上报给上层程序。
4、查询的周期可以为按时间设定的周期,也可以由中断事件触发,不进行等待而立即传输;其中,产生中断事件包括但不限于接口信息发生变化。服务器端程序记录所有通信日志。另外,本申请中的链路包括但不限于以太网络链路、Fibre Channel、InfiniBand等。
本申请公开的上述技术方案可以用于链路信息交换、链路控制,以提供带外的链路协商、故障诊断功能;并且,可以识别整机柜内TOR交换机的业务接口与服务器的业务接口的连接关系,使得TOR交换机获知其每个业务接口所连接的服务器的BMC网管IP地址,从而提高了系统的可靠性,避免了传统的AN、LLDP等协议工作在带内,在链路故障时无法使用的弊端。另外,还可以基于本申请公开的技术方案实现链路连接模式的协商、状态的同步、自动节能、流量控制、链路故障诊断等功能,进而提升数据中心内服务器与TOR交换机之间的链路可靠性,提高诊断效率,降低故障恢复时间,创造客观的经济效益。
本发明实施例还提供了一种链路检测系统,如图3所示,包括服务器端程序11及多个客户端程序12(图3中以客户端程序的数量为3为例进行示意,分别表示为121、122及123),服务器端程序11运行于任一设备上,客户端程序12运行于除该任一设备外的其他设备上,设备包括TOR交换机及服务器;其中:
服务器端程序11用于:获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;通过网管交换机定期发送查询报文至每个客户端程序12;以及将自身包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供上层程序基于接口信息分析实现对应连接关系的链路的连接情况;
每个客户端程序12均用于:响应查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过网管交换机返 回给服务器端程序11。
需要说明的是,本发明实施例提供的一种链路检测系统中相关部分的说明请参见本发明实施例提供的一种链路检测方法中对应部分的详细说明,在此不再赘述。另外,本发明实施例提供的上述技术方案中与现有技术中对应技术方案实现原理一致的部分并未详细说明,以免过多赘述。
对所公开的实施例的上述说明,使本领域技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (9)

  1. 一种链路检测方法,其特征在于,包括:
    服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;所述服务器端程序运行于任一设备上,所述客户端程序运行于除该任一设备外的其他设备上,所述设备包括所述TOR交换机及所述服务器;
    所述服务器端程序通过网管交换机定期发送查询报文至每个所述客户端程序;每个所述客户端程序响应所述查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过所述网管交换机返回给所述服务器端程序;
    所述服务器端程序将自身所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供所述上层程序基于所述接口信息分析实现对应连接关系的链路的连接情况。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    所述服务器端程序通过所述网管交换机定期与各个所述客户端程序交换保活报文,如果未在规定时间内收到任意客户端程序发送的保活报文,则所述服务器端程序确定丢失与该任意客户端程序之间的能够通过所述网管交换机实现信息通信的连接。
  3. 根据权利要求2所述的方法,其特征在于,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
    所述服务器端程序依次关闭所述TOR交换机包括的各业务接口,在关闭所述TOR交换机的任意业务接口后,所述服务器端程序通过所述网管交换机查询丢失连接的任意服务器的业务接口,并确定关闭的所述TOR交换机的任意业务接口与丢失连接的任意服务器的业务接口之间具有连接关系。
  4. 根据权利要求2所述的方法,其特征在于,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
    所述服务器端程序获取预设的连接关系表,并由该连接关系表中获取 所述TOR交换机的业务接口与各所述服务器的业务接口之间的连接关系。
  5. 根据权利要求2所述的方法,其特征在于,服务器端程序获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系,包括:
    所述服务器端程序通过所述网管交换机从各客户端程序中查询各客户端程序所在设备的业务接口的MAC地址,建立查询到的各服务器的业务接口的MAC地址与所述TOR交换机的业务接口的MAC地址的对应关系,实现所述TOR交换机的业务接口与各服务器的业务接口之间的连接关系的建立。
  6. 根据权利要求1所述的方法,其特征在于,还包括:
    当任意服务器的业务接口的接口信息发生变化时,如果该任意服务器运行服务器端程序,则所述服务器端程序将发生变化的接口信息上报给所述上层程序,如果该任意服务器运行客户端程序,则该任意服务器运行的客户端程序将发生变化的接口信息发送给所述服务器端程序,由所述服务器端程序将发生变化的接口信息上报给所述上层程序。
  7. 根据权利要求6所述的方法,其特征在于,还包括:
    所述服务器端程序在所述服务器与所述TOP交换机之间发生通信时,记录发生的通信对应通信信息至通信日志中。
  8. 根据权利要求1所述的方法,其特征在于,每个所述设备均同时包括所述服务器端程序及所述客户端程序,在同一时刻仅有一个设备包括的所述服务器端程序能够正常运行,除包括的所述服务器端程序能够正常运行的设备外的其他设备均能够正常运行自身包括的所述客户端程序。
  9. 一种链路检测系统,其特征在于,包括服务器端程序及多个客户端程序,所述服务器端程序运行于任一设备上,所述客户端程序运行于除该任一设备外的其他设备上,所述设备包括TOR交换机及服务器;其中:
    所述服务器端程序用于:获取TOR交换机的业务接口与各服务器的业务接口之间的连接关系;通过网管交换机定期发送查询报文至每个所述客户端程序;以及将自身包括的与其他设备的业务接口具有连接关系的业务接口的接口信息及接收的接口信息均上报给上层程序,以供所述上层程序基于所述接口信息分析实现对应连接关系的链路的连接情况;
    每个所述客户端程序均用于:响应所述查询报文,将所在设备包括的与其他设备的业务接口具有连接关系的业务接口的接口信息,通过所述网管交换机返回给所述服务器端程序。
PCT/CN2021/073447 2020-05-29 2021-01-23 一种链路检测方法及系统 WO2021238263A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/927,344 US11792098B2 (en) 2020-05-29 2021-01-23 Link detection method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010476918.9A CN111740877B (zh) 2020-05-29 2020-05-29 一种链路检测方法及系统
CN202010476918.9 2020-05-29

Publications (1)

Publication Number Publication Date
WO2021238263A1 true WO2021238263A1 (zh) 2021-12-02

Family

ID=72647980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073447 WO2021238263A1 (zh) 2020-05-29 2021-01-23 一种链路检测方法及系统

Country Status (3)

Country Link
US (1) US11792098B2 (zh)
CN (1) CN111740877B (zh)
WO (1) WO2021238263A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500177A (zh) * 2022-04-13 2022-05-13 北京全路通信信号研究设计院集团有限公司 一种传输通信方式确定方法及其系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111740877B (zh) 2020-05-29 2021-08-10 苏州浪潮智能科技有限公司 一种链路检测方法及系统
CN113641554B (zh) * 2021-08-10 2023-10-27 南威软件股份有限公司 一种跨接口链路监控方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101277214A (zh) * 2007-03-28 2008-10-01 联想(北京)有限公司 一种管理刀片式服务器的方法及系统
US20150092561A1 (en) * 2013-10-01 2015-04-02 Arista Networks, Inc. Method and system for managing switch workloads in a cluster
CN107294760A (zh) * 2016-04-11 2017-10-24 广达电脑股份有限公司 节点管理系统、节点管理方法与计算机可读取存储装置
CN109951325A (zh) * 2019-02-28 2019-06-28 华为技术有限公司 一种网络线缆连接检查方法和装置
CN111740877A (zh) * 2020-05-29 2020-10-02 苏州浪潮智能科技有限公司 一种链路检测方法及系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101015170A (zh) * 2004-09-07 2007-08-08 皇家飞利浦电子股份有限公司 探查服务器在点对点监视系统中的存在
CN101094104A (zh) * 2007-07-30 2007-12-26 中兴通讯股份有限公司 一种通过安全网管代理进行设备管理的方法及其装置
CN101764709B (zh) * 2009-12-29 2012-02-22 福建星网锐捷网络有限公司 基于snmp的网络物理拓扑发现方法及网管服务器
CN103001968A (zh) * 2012-12-14 2013-03-27 温州电力局 一种网络监测系统及方法
CN103441935B (zh) * 2013-08-16 2016-05-25 北京星网锐捷网络技术有限公司 自动识别服务器与接入交换机的邻接关系的方法及装置
CN104468358B (zh) * 2013-09-25 2018-05-11 新华三技术有限公司 分布式虚拟交换机系统的报文转发方法及设备
CN104954165B (zh) * 2015-04-20 2018-09-07 华为技术有限公司 一种链路分析的方法、设备及系统
US11095504B2 (en) * 2019-04-26 2021-08-17 Juniper Networks, Inc. Initializing network device and server configurations in a data center

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101277214A (zh) * 2007-03-28 2008-10-01 联想(北京)有限公司 一种管理刀片式服务器的方法及系统
US20150092561A1 (en) * 2013-10-01 2015-04-02 Arista Networks, Inc. Method and system for managing switch workloads in a cluster
CN107294760A (zh) * 2016-04-11 2017-10-24 广达电脑股份有限公司 节点管理系统、节点管理方法与计算机可读取存储装置
CN109951325A (zh) * 2019-02-28 2019-06-28 华为技术有限公司 一种网络线缆连接检查方法和装置
CN111740877A (zh) * 2020-05-29 2020-10-02 苏州浪潮智能科技有限公司 一种链路检测方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500177A (zh) * 2022-04-13 2022-05-13 北京全路通信信号研究设计院集团有限公司 一种传输通信方式确定方法及其系统
CN114500177B (zh) * 2022-04-13 2022-08-12 北京全路通信信号研究设计院集团有限公司 一种传输通信方式确定方法及其系统

Also Published As

Publication number Publication date
US11792098B2 (en) 2023-10-17
US20230198874A1 (en) 2023-06-22
CN111740877A (zh) 2020-10-02
CN111740877B (zh) 2021-08-10

Similar Documents

Publication Publication Date Title
WO2021238263A1 (zh) 一种链路检测方法及系统
US7778191B2 (en) System and method for fast detection of communication path failures
US20070233855A1 (en) Adaptible keepalive for enterprise extenders
CN100558050C (zh) 一种远程监控或维护的方法及装置
CN105607590B (zh) 用于在过程控制系统中提供冗余性的方法和装置
CN102299846B (zh) 一种bfd报文传输方法和设备
JP4166939B2 (ja) 能動的故障検出
US7734948B2 (en) Recovery of a redundant node controller in a computer system
CN101953139B (zh) 响应于网络层连通性的dhcp初始化
CN102263651A (zh) Snmp网络管理系统中局端设备连接状态的检测方法
JP2006127201A (ja) ストレージシステムおよび導通確認方法
CN102638374B (zh) 基于远程登录协议维护光传输网络的方法
US20040073648A1 (en) Network calculator system and management device
CN101848165B (zh) 控制通信链路中断后恢复的方法和接口板
WO2023124127A1 (zh) 一种主机与存储系统的通信连接方法、装置、设备及介质
CN100576185C (zh) 信息处理装置和信息处理方法
US20050086368A1 (en) System and method for determining nearest neighbor information
TWI698741B (zh) 運用於資料中心的機櫃異常狀態的遠端排除方法
US9118540B2 (en) Method for monitoring a plurality of rack systems
CN113760459A (zh) 虚拟机故障检测方法、存储介质和虚拟化集群
TW202026882A (zh) 運用於資料中心的機櫃異常狀態的遠端排除方法(一)
KR100216580B1 (ko) 비동기전달모드 교환기와 운용워크스테이션간의 통신장애상태 관리방법
CN115426250B (zh) 一种用于靶场指控的双机热备切换方法及装置
CN111224803A (zh) 一种堆叠系统中多主检测方法及堆叠系统
WO2023197972A1 (zh) 一种光传输设备、业务设备、业务传输的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21813033

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21813033

Country of ref document: EP

Kind code of ref document: A1