CN117527637B - Method and device for detecting link failure, storage medium and electronic equipment - Google Patents

Method and device for detecting link failure, storage medium and electronic equipment Download PDF

Info

Publication number
CN117527637B
CN117527637B CN202311865723.3A CN202311865723A CN117527637B CN 117527637 B CN117527637 B CN 117527637B CN 202311865723 A CN202311865723 A CN 202311865723A CN 117527637 B CN117527637 B CN 117527637B
Authority
CN
China
Prior art keywords
processor
data packet
detection data
fault detection
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311865723.3A
Other languages
Chinese (zh)
Other versions
CN117527637A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202311865723.3A priority Critical patent/CN117527637B/en
Publication of CN117527637A publication Critical patent/CN117527637A/en
Application granted granted Critical
Publication of CN117527637B publication Critical patent/CN117527637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The specification discloses a method, apparatus, storage medium and electronic device for link failure detection, wherein a first processor communicates with a second processor through a high-speed interconnection interface. Since the high-speed interconnection interface between two connected processors may be abnormal not only in physical connection state but also in communication connection state, the first processor generates a failure detection data packet for detecting the communication connection state, and sends the failure detection data packet to the second processor, so as to detect the communication connection state of the high-speed interconnection interface according to the first failure detection data packet of the first processor and the response of the second processor based on the first failure detection data packet, and determine whether the link between the two connected processors has a failure.

Description

Method and device for detecting link failure, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computers, and in particular, to a method, an apparatus, a storage medium, and an electronic device for detecting a link failure.
Background
With the development of computer technology and the increase of demand, the efficiency of processing services using only one processor may be low, so that a plurality of processors may be connected to improve the efficiency of executing services. Since the progress of executing a service is affected when a link connection between two interconnected processors fails, it is necessary to detect whether or not a link between two interconnected processors fails when a plurality of processor connections are completed. When detecting whether a link between two interconnected processors fails, if the two processors are interconnected by a high-speed interconnect technology, the link may be detected as failed by a communication protocol. However, the types of link connection faults between two connected processors are various, and some kinds of link connection faults between two connected processors may not be detected only by the communication protocol, for example, if a connection between the two connected processors occurs in a receiving direction and a sending direction of the two connected processors, the connection fault also belongs to the link connection fault, but the link connection fault may not be detected by the communication protocol, so that a problem occurs in subsequent data transmission, and service execution is affected.
Based on this, the present specification provides a method of link failure detection.
Disclosure of Invention
The present disclosure provides a method, an apparatus, a storage medium, and an electronic device for detecting a link failure, so as to at least partially solve the foregoing problems in the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a method of link failure detection, a first processor communicating with a second processor through a high-speed interconnect interface; comprising the following steps:
generating a first fault detection data packet for detecting link faults according to a preset fault detection data generation format;
transmitting the first failure detection data packet to the second processor through the high-speed interconnection interface;
and detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, before detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet, the method further includes:
Detecting a physical connection state of a high-speed interconnection interface of the first processor;
and when the physical connection state of the high-speed interconnection interface of the first processor is normal, detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, the generating the first failure detection data packet for detecting the link failure according to the preset failure detection data generation format specifically includes:
determining first source information in first receiving and transmitting information according to the information of the first processor, wherein the first source information comprises first source port information of the first processor and an identifier of the first processor;
determining first target information in first receiving and transmitting information according to the information of the second processor, wherein the first target information comprises first target port information of the second processor and an identifier of the second processor;
and obtaining a first fault detection data packet for detecting the link fault according to the first transceiving information.
Optionally, detecting, according to the first failure detection data packet and the response of the second processor based on the first failure detection data packet, a communication connection state of a high-speed interconnection interface of the first processor specifically includes:
And when the first processor does not receive the second fault detection data packet sent by the second processor aiming at the first fault detection data packet within the preset time, detecting that the communication connection state of the high-speed interconnection interface of the first processor is single-pass abnormal.
Optionally, detecting, according to the first failure detection data packet and the response of the second processor based on the first failure detection data packet, a communication connection state of a high-speed interconnection interface of the first processor specifically includes:
acquiring first data packet content in the first fault detection data packet;
receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring the content of a second data packet in the second fault detection data packet;
and detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality when the first data packet content and the second data packet content meet a first specified condition.
Optionally, detecting, according to the first failure detection data packet and the response of the second processor based on the first failure detection data packet, a communication connection state of a high-speed interconnection interface of the first processor specifically includes:
Acquiring the first receiving and transmitting information in the first fault detection data packet;
receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring second transceiver address information in the second fault detection data packet;
and when the first transceiving information and the second transceiving information do not meet a second specified condition, detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality.
Optionally, detecting a communication connection state of the high-speed interconnection interface of the first processor specifically includes:
if the first source information is matched with the second target information and the first target information is matched with the second source information, the first receiving and transmitting information and the second receiving and transmitting information meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be normal;
if the first source information is not matched with the second target information or the first target information is not matched with the second source information, the first receiving and sending address information and the second receiving and sending information do not meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be abnormal in loop back.
Optionally, the high-speed interconnection interface includes a sending buffer area, a receiving buffer area, a link controller and a link physical layer.
The present specification provides a method of link failure detection, a first processor communicating with a second processor through a high-speed interconnect interface; comprising the following steps:
the second processor receives a first fault detection data packet sent by the first processor through the high-speed interconnection interface, wherein the first fault detection data packet is sent when the physical connection state of the high-speed interconnection interface of the first processor is normal;
and responding to the first processor according to the first fault detection data packet, so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, responding to the first processor according to the first fault detection data packet specifically includes:
acquiring first data packet content in the first fault detection data packet;
processing the first data packet content according to a preset rule to obtain a second data packet content;
Generating a second fault detection data packet according to the content of the second data packet and a preset fault detection data generation format;
and sending the second fault detection data packet to the first processor, so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the second fault detection data packet and the first fault detection data packet.
The present specification provides an apparatus for link failure detection, a first processor in communication with a second processor through a high-speed interconnect interface; the device comprises:
the first fault detection data packet generation module is used for generating a first fault detection data packet for detecting link faults according to a preset fault detection data generation format;
the first fault detection data packet sending module is used for sending the first fault detection data packet to the second processor through the high-speed interconnection interface;
and the communication connection state detection module is used for detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, the apparatus further comprises:
the physical connection state detection module is used for detecting the physical connection state of the high-speed interconnection interface of the first processor; and when the physical connection state of the high-speed interconnection interface of the first processor is normal, detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, the first failure detection data packet generating module is specifically configured to determine, according to information of the first processor, first source information in first transceiving information, where the first source information includes first source port information of the first processor and an identifier of the first processor; determining first target information in first receiving and transmitting information according to the information of the second processor, wherein the first target information comprises first target port information of the second processor and an identifier of the second processor; and obtaining a first fault detection data packet for detecting the link fault according to the first transceiving information.
Optionally, the communication connection state detection module is specifically configured to detect that the communication connection state of the high-speed interconnection interface of the first processor is a single-pass exception when the first processor does not receive the second fault detection data packet sent by the second processor for the first fault detection data packet within a preset duration.
Optionally, the first failure detection data packet generating module is specifically configured to determine content to be sent; determining the content of a first data packet according to the content to be sent; generating a first fault detection data packet according to the content of the first data packet;
the communication connection state detection module is specifically configured to obtain a first data packet content in the first failure detection data packet; receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring the content of a second data packet in the second fault detection data packet; and detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality when the first data packet content and the second data packet content meet a first specified condition.
Optionally, the communication connection state detection module is specifically configured to obtain the first transceiving information in the first failure detection data packet; receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring second transceiving information in the second fault detection data packet; and when the first transceiving information and the second transceiving information do not meet a second specified condition, detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality.
Optionally, the communication connection state detection module is specifically configured to detect that a communication connection state of a high-speed interconnection interface of the first processor is normal if the first source information is matched with the second target information and the first target information is matched with the second source information, and the first transceiving information and the second transceiving information meet a second specified condition; and if the first source information is not matched with the second target information or the first target information is not matched with the second source information, the first receiving and transmitting information and the second receiving and transmitting information do not meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be loop-back abnormal.
Optionally, the high-speed interconnection interface includes a sending buffer area, a receiving buffer area, a link controller and a link physical layer.
The present specification provides an apparatus for link failure detection, a first processor in communication with a second processor through a high-speed interconnect interface; the device comprises:
the first fault detection data packet receiving module is used for receiving a first fault detection data packet sent by the first processor through the high-speed interconnection interface by the second processor, wherein the first fault detection data packet is sent when the physical connection state of the high-speed interconnection interface of the first processor is normal;
And the response module is used for responding to the first processor according to the first fault detection data packet so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, the response module is specifically configured to obtain a first packet content in the first failure detection packet; processing the first data packet content according to a preset rule to obtain a second data packet content; generating a second fault detection data packet according to the content of the second data packet and a preset fault detection data generation format; and sending the second fault detection data packet to the first processor, so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the second fault detection data packet and the first fault detection data packet.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of link failure detection described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of link failure detection described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
as can be seen from the method for detecting a link failure provided in the present specification, since not only a physical connection state but also a communication connection state of a high-speed interconnect interface between two connected processors may be abnormal, the method first generates a failure detection packet for detecting a communication connection state, and detects the communication connection state of the high-speed interconnect interface according to a response of the first failure detection packet of the first processor and the second processor based on the first failure detection packet, so as to determine whether a link between the two connected processors has a failure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
Fig. 1 is a flow chart of a method for detecting link failure provided in the present specification;
FIG. 2 is a schematic diagram of the connection of the high speed interconnect interfaces of two processors;
FIG. 3 is a schematic diagram of a failure detection packet provided herein;
FIG. 4 is a schematic diagram of a single pass exception of a processor;
FIG. 5 is a schematic diagram of a processor loop-back exception;
fig. 6 is a flow chart of a method for detecting link failure provided in the present specification;
fig. 7 is a schematic diagram of an apparatus for link failure detection provided in the present specification;
fig. 8 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for detecting link failure provided in the present specification, specifically including the following steps:
s100: and generating a first fault detection data packet for detecting the link fault according to a preset fault detection data generation format.
Because the link connection state between the processors has a certain influence on the subsequent execution of the service, the link connection state between the processors needs to be determined before the service is executed by the processors, if the link connection between two connected processors fails, data transmission cannot be performed between the processors, the service execution process may be affected, and even the service cannot be executed.
Typically, two processors may be connected through a high-speed serial computer expansion bus standard PCIe (Peripheral Component Interconnect express, PCIe) bus or a high-speed interconnect interface, if two processors are connected through the high-speed interconnect interface, whether a link between two connected processors is faulty or not may be detected through a communication protocol, but if there are many types of link faults that may exist between two connected processors, it may not be possible to detect some kind of link connection fault between two connected processors only through the communication protocol. Accordingly, the present specification provides a method of link failure detection. The execution body of the present specification may be a system management unit (System Management Unit, SMU) for detecting whether a link of a processor is failed, or may be other processor or electronic device capable of detecting whether a link is failed. For convenience of explanation, a method for detecting link failure provided in the present specification will be explained below with only SMU as an execution body.
FIG. 2 is a schematic diagram of the connection of the high-speed interconnect interfaces of two processors, as shown in FIG. 2.
In one or more embodiments of the present disclosure, one processor may be connected to a plurality of processors, and for convenience of explanation, the following will take an example in which one processor is connected to only one processor through a high-speed interconnect interface, i.e., a first processor communicates with a second processor through a high-speed interconnect interface. As shown in fig. 2, the two connected processors are a first processor and a second processor, respectively. For each processor, there is at least one high-speed interconnect interface, and the processor may be a graphics processor (Graphics Processing Unit, GPU), that is, the first processor and the second processor comprise graphics processors. The processor may include an SMU, PCIe, a high speed interconnect bus, etc. within the processor. The processor may be connected to a host (host) through PCIe to communicate with other processors. The high-speed interconnect interface includes a transmit buffer (Tx FIFO), a receive buffer (Rx FIFO), a Link controller (Link controller), and a Link physical layer (Link PHY).
The abnormal conditions of the high-speed interconnection interface comprise abnormal physical connection state and abnormal communication connection state, and the abnormal physical connection state is the abnormal physical circuit connection state. In general, if the physical connection state of the high-speed interconnect interface of the first processor is abnormal, the communication connection state of the high-speed interconnect interface of the first processor is abnormal, otherwise, it is not necessarily the case. Because when the physical connection state is normal, the protocol layer, the application layer and other software layers of the high-speed interconnection interface may be abnormal, resulting in abnormal communication connection state. Therefore, when detecting the state of the high-speed interconnection interface of the first processor, the physical connection state of the high-speed interconnection interface of the first processor can be detected first.
Specifically, the first processor receives a fault detection instruction, that is, an SMU receives a fault detection instruction of a high-speed interconnection interface sent by a host through PCIe, and the SMU detects a physical connection state of the high-speed interconnection interface of the first processor according to the fault detection instruction, and detects a communication connection state of the high-speed interconnection interface of the first processor when the physical connection state of the high-speed interconnection interface of the first processor is normal.
In one or more embodiments of the present disclosure, the SMU generates a first failure detection packet for detecting a link failure according to a preset failure detection data generation format, and generates the first failure detection packet, so that a subsequent step detects a communication connection state of a high-speed interconnection interface of the first processor according to the first failure detection packet.
Fig. 3 is a schematic diagram of a fault detection data packet provided in the present disclosure, where, as shown in fig. 3, the fault detection data packet may include transceiving information, packet content and packet type, the transceiving information includes source information and destination information, the source information includes source port information and source processor information, and the destination information includes destination port information and destination processor information. The data packet type characterizes the data packet as a data packet for detecting whether a link is abnormal, the source processor information and the target processor information are identifiers of corresponding processors, and the source port information and the target port information are port numbers of the corresponding processors. The present specification does not limit the failure detection data generation format, that is, the present specification does not limit the order of storage locations of source information, destination information, packet contents, and packet types.
Then, the SMU may determine first source information in the first transceiving information according to the information of the first processor, where the first source information includes first source port information of the first processor and an identification of the first processor. And determining first target information in the first receiving and transmitting information according to the second processor, wherein the first target information comprises first target port information and first target processor information, and obtaining a first fault detection data packet for detecting the link fault after the first receiving and transmitting information is obtained.
In addition, the SMU may determine the content to be sent, and determine the first packet content according to the content to be sent. And generating a first fault detection data packet according to the content of the first data packet.
The present specification does not limit the information included in the failure detection packet, for example, the failure detection packet may include only source information and destination information, and may include only packet contents, so long as it is possible to determine whether the communication connection state of the high-speed interconnect interface of the first processor is normal according to the failure detection packet.
S102: and sending the first fault detection data packet to the second processor through the high-speed interconnection interface.
Specifically, as shown in fig. 2, the SMU sends the first failure detection data packet to the Tx FIFO through the bus, then sends the first failure detection data packet to the Rx (receiving port) in the Link PHY of the second processor through the Tx (transmitting port) in the Link Controller and the Tx in the Link PHY in sequence, and finally sends the first failure detection data packet to the SMU of the second processor through the Rx, the Rx FIFO and the bus in the Link Controller of the second processor in sequence.
S104: and detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
After the SMU sends the first fault detection data packet to the second processor, the second processor may respond correspondingly to the first fault detection data packet, for example, parse information in the first fault detection data packet, process the information, generate a second fault detection data packet according to the processed information, and return the second fault detection data packet to the first processor. Thus, the SMU may detect the communication connection status of the high-speed interconnect interface of the first processor based on the response of the second processor.
Specifically, the abnormal communication connection state of the high-speed interconnection interface of the first processor may include a single-pass abnormality and a loopback abnormality, and fig. 4 is a schematic structural diagram of the single-pass abnormality of the processor, where the single-pass abnormality refers to that the first processor sends data to the second processor, and after receiving the data, the second processor fails to successfully return the response content responding to the data to the first processor, so that the first processor does not receive the response content returned by the second processor. As shown in fig. 4, any one or more connection lines marked with an "x" in the first processor and the second processor fail, so that the second processor cannot respond to the first processor. Therefore, the SMU may need to detect, according to the response content of the second processor, whether the communication connection state of the high-speed interconnection interface of the first processor is a single-pass abnormality, and detect, when the first processor does not receive the second fault detection data packet sent by the second processor for the first fault detection data packet within a preset duration, that the communication connection state of the high-speed interconnection interface of the first processor is a single-pass abnormality. It should be noted that, if the data content returned by the second processor is empty, it may also indicate that the communication connection state of the high-speed interconnection interface of the first processor is a single-pass exception.
Fig. 5 is a schematic structural diagram of a processor loopback exception, and as shown in fig. 5, the loopback exception refers to that the data content received by the processor is consistent with the data content sent by the processor, and the reason for the loopback exception may be that when the processor is physically connected, the receiving direction of the processor is connected with the sending direction, or a software layer of a high-speed interconnection interface may fail. In this case, the physical connection state of the high-speed interconnect interface may also be detected as normal, and the first processor may receive the second failure detection packet returned by the second processor. Therefore, in order to detect whether the processor has a loopback exception, the SMU may detect a communication connection state of the high-speed interconnect interface of the first processor according to the second packet content in the second failure detection packet and the first packet content in the first failure detection packet.
Firstly, the SMU acquires the first data packet content in the first fault detection data packet, receives a second fault detection data packet returned by the second processor based on the first fault detection data packet, and acquires the second data packet content in the second fault detection data packet. And when the first data packet content and the second data packet content meet a first specified condition, detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality. The first specified condition may be set according to a preset rule, for example, the preset rule is to add one to a corresponding value of the first data packet content, and then the second processor processes the first data packet according to the preset rule after receiving the first data packet for fault detection, that is, adds one to the value of the first data packet content in the first data packet for fault detection, so as to obtain the second data packet content. Therefore, the first specified condition is that the result of subtracting the value of the second packet content from the value of the first packet content is not 1.
Based on the method for detecting the link failure shown in fig. 1, since the physical connection state of the high-speed interconnection interface between two connected processors may be abnormal, and the communication connection state may be abnormal, the method firstly judges whether the physical connection state of the high-speed interconnection interface of the first processor is normal, and when the physical connection state of the high-speed interconnection interface is normal, then judges whether the communication connection state of the high-speed interconnection interface is normal according to the first failure detection data packet of the first processor and the response of the second processor based on the first failure detection data packet, so as to detect whether the link between the two connected processors has failure when the communication protocols of the two connected processors are different. That is, the present method is not limited to the type of communication protocol used by the processor.
In one or more embodiments of the present disclosure, a first processor and a second processor are connected to a host through a bus, and when a physical connection state and/or a communication connection state of a high-speed interconnect interface of the first processor is abnormal, a high-speed interconnect interface abnormality report is generated, and the high-speed interconnect interface abnormality report is sent to the host, so that the host displays the high-speed interconnect interface abnormality report, so that a user can solve the link failure problem as soon as possible.
For step S104, since the first processor and the second processor are different processors, the sending and receiving information in the first failure detection data packet and the second failure detection data packet sent by the first processor and the second processor respectively are different, and then the SMU can also detect the communication connection state of the high-speed interconnection interface of the first processor through the sending and receiving information in the failure detection data packet.
Specifically, the SMU first acquires first transceiving information in the first fault detection data packet, where the first transceiving information includes first source information and first target information, the first source information includes first source port information and first source processor information, and the first target information includes first target port information and first target processor information. Similarly, the second fault detection data packet includes second transceiving information, where the second transceiving information includes second source information and second target information, the second source information includes second source port information and second source processor information, and the second target information includes second target port information and second target processor information. That is, the source processor information in the first failure detection packet may be referred to as first source processor information, which refers to an identification of the first processor, and the target processor information in the first failure detection packet may be referred to as first target processor information, which refers to an identification of the second processor, the first source port information refers to a port number of the first processor, and the first target port information refers to a port number of the second processor.
The source processor information in the second failure detection packet may be referred to as second source processor information, which refers to an identification of the second processor, and the destination processor information in the second failure detection packet may be referred to as second destination processor information, which refers to an identification of the first processor, which refers to a port number of the second processor, which refers to a port number of the first processor.
And then, receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet, acquiring second receiving and transmitting information in the second fault detection data packet, and detecting that the communication connection state of the high-speed interconnection interface of the first processor is loop-back abnormal when the first receiving and transmitting information and the second receiving and transmitting information do not meet a second specified condition. The second specified condition means that the first source information is matched with the second target information, and the first target information is matched with the second source information. That is, if the first source information matches the second target information and the first target information matches the second source information, the first transceiving information and the second transceiving information satisfy a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be normal. If the first source information is not matched with the second target information or the first target information is not matched with the second source information, the first receiving and transmitting information and the second receiving and transmitting information do not meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be loop-back abnormal. Wherein, matching refers to information consistency.
TABLE 1
TABLE 2
For example, as shown in table 1, in the first failure detection packet, the source processor information, the source port information, the destination processor information, and the destination port information are 3, 4, 1, and 2, respectively, the packet content is 1, and the packet type is 7, which indicates that the data table is a failure detection packet. As shown in table 2, in the second fault detection packet, the source processor information, the source port information, the target processor information, and the target port information are 1, 2, 3, and 4, respectively, and the packet content is 4, then the first source information is consistent with the second target information, and the first target information is consistent with the second source information, so that the first transceiving information and the second transceiving information satisfy the second specified condition, and then the communication connection state of the high-speed interconnection interface of the first processor is normal.
Based on the flow chart of the method for detecting link failure shown in fig. 1, the present disclosure further provides a method for detecting link failure executed by an SMU of a second processor, where a first processor communicates with the second processor through a high-speed interconnect interface, and fig. 6 is a flow chart of the method for detecting link failure provided in the present disclosure, specifically including the following steps:
S200: and the second processor receives the first fault detection data packet sent by the first processor through the high-speed interconnection interface.
It should be noted that, the first failure detection packet is sent when the physical connection state of the high-speed interconnection interface of the first processor is normal.
S202: and responding to the first processor according to the first fault detection data packet.
Specifically, the SMU acquires the first data packet content in the first fault detection data packet, and processes the first data packet content according to a preset rule to obtain the second data packet content. Wherein, the preset rule needs to correspond to the first specified condition in step S104. For example, the value of the first packet content is 6, and the first specified condition is that the result of subtracting the value of the second packet content from the value of the first packet content is 1, then the value of the second packet content obtained by processing the first packet content according to the preset rule cannot be 7 or 5. It should be noted that, the preset rule may be changed according to different processors, and when there is no failure in the link between two connected processors, the first data packet content and the second data packet content may not meet the first specified condition.
And then, generating a second fault detection data packet according to the content of the second data packet and a preset fault detection data generation format. The second source information of the second receiving and transmitting information is determined according to the information of the second processor, the second source information comprises the second source port information of the second processor and the identifier of the second processor, the second target information in the second receiving and transmitting information is determined according to the information of the first processor, the second target information comprises the second target port information of the first processor and the identifier of the first processor, and the second fault detection data packet is obtained after the second receiving and transmitting information is obtained. The second failure detection packet may further include packet content, and then the SMU may determine, according to the first packet content, second packet content to be returned, and generate a second failure detection packet according to the second packet content.
And finally, sending the second fault detection data packet to the first processor so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the second fault detection data packet and the first fault detection data packet. The first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
When one processor is connected to a plurality of processors, the above-described link failure detection method may be used to detect the link connection state between the processor and any one of the other processors. For example, in one possible implementation, the first processor is connected to the second processor and the third processor at the same time, and the first processor may generate two failure detection data packets simultaneously, including the first failure detection data packet and the third failure detection data packet, and send the first failure detection data packet to the second processor, and send the third failure detection data packet to the third processor, so as to detect the link connection state of the first processor and the second processor and the link connection state of the first processor and the third processor, respectively. In another possible implementation, the multiple processors may be tested pairwise according to the above method.
The detection method disclosed in the specification is applied to communication link detection when at least two GPU chips are interconnected, wherein a first processor can be one GPU chip in a link, which is called chip 0, when a plurality of chips exist, the chips can be sequentially named through the chip identifiers, and a second processor can be another GPU chip in the link, which is called chip 1, chip n or the like, wherein the first processor can be directly interconnected with the second processor through a high-speed interconnection interface, and can also be interconnected with other chips through the high-speed interconnection interface and then further interconnected with the second processor.
In the case where the first processor is directly interconnected to the second processor through the high-speed interconnect interface, the link detection may be performed by the above method in the specification. Under the condition that the first processor is interconnected with other chips through the high-speed interconnection interface and then is interconnected with the second processor, when the link communication between the first processor and the second processor is normal and the high-speed interconnection communication interface is normal, the first processor can generate a first fault detection data packet through the SMU and then send the first fault detection data packet to the second processor through the high-speed interconnection interface and the other chips. When detecting whether the first data packet content and the second data packet content meet a first specified condition, the first specified condition may be that corresponding values of the first data packet content are subjected to M-adding processing, where M is the number of interval chips between the first processor and the second processor plus one. The change condition of the other corresponding data contents can also be changed, for example, the change condition can be that the chip identification of the corresponding position of the second processor is changed, or other specified values associated with the ordering position of the second processor are added on the basis of the original first data packet content, or the content is coded, and the like.
The above method for detecting link failure provided for one or more embodiments of the present specification further provides a corresponding device for detecting link failure based on the same concept, as shown in fig. 6.
FIG. 7 is a schematic diagram of an apparatus for link failure detection provided herein, a first processor in communication with a second processor via a high-speed interconnect interface; the device comprises:
a first failure detection data packet generation module 700, configured to generate a first failure detection data packet for detecting a link failure according to a preset failure detection data generation format;
a first failure detection data packet sending module 702, configured to send the first failure detection data packet to the second processor through the high-speed interconnection interface;
and the communication connection state detection module 704 is configured to detect a communication connection state of the high-speed interconnection interface of the first processor according to the first failure detection data packet and the response of the second processor based on the first failure detection data packet.
Optionally, the apparatus further comprises:
a physical connection state detection module 706, configured to detect a physical connection state of a high-speed interconnect interface of the first processor; and when the physical connection state of the high-speed interconnection interface of the first processor is normal, detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, the first failure detection data packet generating module 700 is specifically configured to determine, according to the information of the first processor, first source information in first transceiving information, where the first source information includes first source port information of the first processor and an identifier of the first processor; determining first target information in first receiving and transmitting information according to the information of the second processor, wherein the first target information comprises first target port information of the second processor and an identifier of the second processor; and obtaining a first fault detection data packet for detecting the link fault according to the first transceiving information.
Optionally, the communication connection state detection module 704 is specifically configured to detect that the communication connection state of the high-speed interconnection interface of the first processor is a single-pass exception when the first processor does not receive the second fault detection data packet sent by the second processor for the first fault detection data packet within a preset duration.
Optionally, the first failure detection packet generation module 700 is specifically configured to determine content to be sent; determining the content of a first data packet according to the content to be sent; generating a first fault detection data packet according to the content of the first data packet;
The communication connection state detection module 704 is specifically configured to obtain a first packet content in the first failure detection packet; receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring the content of a second data packet in the second fault detection data packet; and detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality when the first data packet content and the second data packet content meet a first specified condition.
Optionally, the communication connection state detection module 704 is specifically configured to obtain the first transceiving information in the first failure detection data packet; receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring second transceiving information in the second fault detection data packet; and when the first transceiving information and the second transceiving information do not meet a second specified condition, detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality.
Optionally, the communication connection state detection module 704 is specifically configured to detect that the communication connection state of the high-speed interconnection interface of the first processor is normal if the first source information is matched with the second target information and the first target information is matched with the second source information, and the first transceiving information and the second transceiving information meet a second specified condition; and if the first source information is not matched with the second target information or the first target information is not matched with the second source information, the first receiving and transmitting information and the second receiving and transmitting information do not meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be loop-back abnormal.
Optionally, the high-speed interconnection interface includes a sending buffer area, a receiving buffer area, a link controller and a link physical layer.
The present specification provides an apparatus for link failure detection, a first processor in communication with a second processor through a high-speed interconnect interface; the device comprises:
the first fault detection data packet receiving module is used for receiving a first fault detection data packet sent by the first processor through the high-speed interconnection interface by the second processor, wherein the first fault detection data packet is sent when the physical connection state of the high-speed interconnection interface of the first processor is normal;
and the response module is used for responding to the first processor according to the first fault detection data packet so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
Optionally, the response module is specifically configured to obtain a first packet content in the first failure detection packet; processing the first data packet content according to a preset rule to obtain a second data packet content; generating a second fault detection data packet according to the content of the second data packet and a preset fault detection data generation format; and sending the second fault detection data packet to the first processor, so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the second fault detection data packet and the first fault detection data packet.
The present specification also provides a computer readable storage medium storing a computer program operable to perform the method of link failure detection provided in fig. 1 above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 8. As shown in fig. 8, at the hardware level, the unmanned device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may of course include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the same to implement the method for detecting link failure described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims (11)

1. A method of link failure detection, wherein a first processor communicates with a second processor through a high-speed interconnect interface, the first processor and the second processor comprising a graphics processor; the method comprises the following steps:
generating a first fault detection data packet for detecting link faults according to a preset fault detection data generation format;
transmitting the first failure detection data packet to the second processor through the high-speed interconnection interface;
detecting a communication connection state of a high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet, wherein the communication connection state abnormality comprises a software layer abnormality of the high-speed interconnection interface;
before detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet, the method further includes:
detecting a physical connection state of a high-speed interconnection interface of the first processor;
and when the physical connection state of the high-speed interconnection interface of the first processor is normal, detecting the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
2. The method of claim 1, wherein generating the first failure detection data packet for detecting the link failure according to the preset failure detection data generation format, specifically comprises:
determining first source information in first receiving and transmitting information according to the information of the first processor, wherein the first source information comprises first source port information of the first processor and an identifier of the first processor;
determining first target information in first receiving and transmitting information according to the information of the second processor, wherein the first target information comprises first target port information of the second processor and an identifier of the second processor;
and obtaining a first fault detection data packet for detecting the link fault according to the first transceiving information.
3. The method of claim 1, wherein detecting the communication connection status of the high-speed interconnect interface of the first processor based on the response of the first failure detection packet according to the first failure detection packet and the second processor, specifically comprises:
and when the first processor does not receive the second fault detection data packet sent by the second processor aiming at the first fault detection data packet within the preset time, detecting that the communication connection state of the high-speed interconnection interface of the first processor is single-pass abnormal.
4. The method of claim 1, wherein generating the first failure detection data packet for detecting the link failure according to the preset failure detection data generation format, specifically comprises:
determining content to be sent; determining the content of a first data packet according to the content to be sent;
generating a first fault detection data packet according to the content of the first data packet;
according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet, detecting the communication connection state of the high-speed interconnection interface of the first processor specifically includes:
acquiring first data packet content in the first fault detection data packet;
receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring the content of a second data packet in the second fault detection data packet;
and detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality when the first data packet content and the second data packet content meet a first specified condition.
5. The method of claim 2, wherein detecting the communication connection status of the high-speed interconnect interface of the first processor based on the response of the first failure detection packet according to the first failure detection packet and the second processor, specifically comprises:
Acquiring the first receiving and transmitting information in the first fault detection data packet;
receiving a second fault detection data packet returned by the second processor based on the first fault detection data packet; acquiring second transceiving information in the second fault detection data packet;
and when the first transceiving information and the second transceiving information do not meet a second specified condition, detecting that the communication connection state of the high-speed interconnection interface of the first processor is a loopback abnormality.
6. The method of claim 5, wherein detecting the communication connection status of the high-speed interconnect interface of the first processor, specifically comprises:
if the first source information is matched with the second target information and the first target information is matched with the second source information, the first receiving and transmitting information and the second receiving and transmitting information meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be normal;
and if the first source information is not matched with the second target information or the first target information is not matched with the second source information, the first receiving and transmitting information and the second receiving and transmitting information do not meet a second specified condition, and the communication connection state of the high-speed interconnection interface of the first processor is detected to be loop-back abnormal.
7. The method of claim 1, wherein the high-speed interconnect interface comprises a transmit buffer, a receive buffer, a link controller, a link physical layer.
8. A method of link failure detection, wherein a first processor communicates with a second processor through a high-speed interconnect interface, the first processor and the second processor comprising a graphics processor; the method comprises the following steps:
the second processor receives a first fault detection data packet sent by the first processor through the high-speed interconnection interface, wherein the first fault detection data packet is sent when the physical connection state of the high-speed interconnection interface of the first processor is normal;
and responding to the first processor according to the first fault detection data packet, so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the first fault detection data packet and the response of the second processor based on the first fault detection data packet.
9. The method of claim 8, wherein responding to the first processor based on the first failure detection packet, comprises:
Acquiring first data packet content in the first fault detection data packet;
processing the first data packet content according to a preset rule to obtain a second data packet content;
generating a second fault detection data packet according to the content of the second data packet and a preset fault detection data generation format;
and sending the second fault detection data packet to the first processor, so that the first processor detects the communication connection state of the high-speed interconnection interface of the first processor according to the second fault detection data packet and the first fault detection data packet.
10. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-9.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-9 when executing the program.
CN202311865723.3A 2023-12-29 2023-12-29 Method and device for detecting link failure, storage medium and electronic equipment Active CN117527637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311865723.3A CN117527637B (en) 2023-12-29 2023-12-29 Method and device for detecting link failure, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311865723.3A CN117527637B (en) 2023-12-29 2023-12-29 Method and device for detecting link failure, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN117527637A CN117527637A (en) 2024-02-06
CN117527637B true CN117527637B (en) 2024-04-02

Family

ID=89753397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311865723.3A Active CN117527637B (en) 2023-12-29 2023-12-29 Method and device for detecting link failure, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117527637B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1855850A (en) * 2005-04-19 2006-11-01 华为技术有限公司 Round trip method
CN101035028A (en) * 2007-01-18 2007-09-12 华为技术有限公司 Access error detection method and network device
CN108616418A (en) * 2018-03-30 2018-10-02 新华三技术有限公司 Detect the method and device of failure
WO2021018122A1 (en) * 2019-07-30 2021-02-04 北京大学 Resource allocation and access method for open wireless channel
CN114896110A (en) * 2022-05-20 2022-08-12 龙芯中科技术股份有限公司 Link detection method, device, equipment and storage medium
CN115549775A (en) * 2022-12-05 2022-12-30 北京百度网讯科技有限公司 Method for processing optical signal transmission abnormity, optical transmission equipment and system
CN116647476A (en) * 2023-04-27 2023-08-25 天津中科曙光存储科技有限公司 Network management method, apparatus, computer device, storage medium, and program product

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1855850A (en) * 2005-04-19 2006-11-01 华为技术有限公司 Round trip method
CN101035028A (en) * 2007-01-18 2007-09-12 华为技术有限公司 Access error detection method and network device
CN108616418A (en) * 2018-03-30 2018-10-02 新华三技术有限公司 Detect the method and device of failure
WO2021018122A1 (en) * 2019-07-30 2021-02-04 北京大学 Resource allocation and access method for open wireless channel
CN114896110A (en) * 2022-05-20 2022-08-12 龙芯中科技术股份有限公司 Link detection method, device, equipment and storage medium
CN115549775A (en) * 2022-12-05 2022-12-30 北京百度网讯科技有限公司 Method for processing optical signal transmission abnormity, optical transmission equipment and system
CN116647476A (en) * 2023-04-27 2023-08-25 天津中科曙光存储科技有限公司 Network management method, apparatus, computer device, storage medium, and program product

Also Published As

Publication number Publication date
CN117527637A (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN109688058B (en) Message processing method and device and network equipment
CN104699576B (en) Serial communication testing device, system comprising same and method thereof
CN106878164A (en) A kind of message transmitting method and device
US11314418B2 (en) Extensible storage system and method
US11126522B2 (en) Method and apparatus for offloading functional data from an interconnect component
CN114143140A (en) Data transmission system, method, storage medium and electronic equipment
US9208008B2 (en) Method and apparatus for multi-chip reduced pin cross triggering to enhance debug experience
EP3285173A1 (en) Cpu interconnecting apparatus, system and control method, control apparatus therefor
CN117527637B (en) Method and device for detecting link failure, storage medium and electronic equipment
KR101637998B1 (en) Communication apparatus and method for serial peripheral interface
JPWO2020166378A1 (en) Communication equipment and methods, and programs
CN114880266A (en) Fault processing method and device, computer equipment and storage medium
CN115955432B (en) Method and device for determining physical link and electronic equipment
CN116743550B (en) Processing method of fault storage nodes of distributed storage cluster
US10795797B2 (en) Controller, SATA system and method of operation therefor
CN108241117B (en) System and method for testing semiconductor devices
CN115208854B (en) MLAG dynamic double-master detection method, device, equipment and medium based on DHCP
CN116846517B (en) Network data transmission method and device, storage medium and electronic equipment
US20230016684A1 (en) Communications Method and Related Apparatus
CN116127148B (en) Data trusted storage method and device, storage medium and electronic equipment
EP3631640A1 (en) Communication between field programmable gate arrays
US9081743B2 (en) Communication system and communicaton method
CN113535494B (en) Equipment debugging method and electronic equipment
US20050175011A1 (en) Data driven type information processing apparatus and method of increasing transfer speed of data packet
CN113010602A (en) Data synchronization method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant