CN113992501A - Fault positioning system, method and computing device - Google Patents

Fault positioning system, method and computing device Download PDF

Info

Publication number
CN113992501A
CN113992501A CN202010656493.XA CN202010656493A CN113992501A CN 113992501 A CN113992501 A CN 113992501A CN 202010656493 A CN202010656493 A CN 202010656493A CN 113992501 A CN113992501 A CN 113992501A
Authority
CN
China
Prior art keywords
slave
master
slave device
isolation
master device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010656493.XA
Other languages
Chinese (zh)
Inventor
谢绍炜
李元有
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010656493.XA priority Critical patent/CN113992501A/en
Publication of CN113992501A publication Critical patent/CN113992501A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/40169Flexible bus arrangements
    • H04L12/40176Flexible bus arrangements involving redundancy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a fault positioning system, a fault positioning method and a computing device, which are used for quickly positioning equipment with faults on bus topology in a short time. The system comprises a master device, a first slave device, a second slave device, a first isolation device and a second isolation device, wherein the master device is respectively connected with the first slave device and the second slave device through buses. And the master device can control the first isolation device to disconnect the master device from the first slave device, control the second isolation device to connect the master device with the second slave device, and determine whether the communication between the master device and the second slave device is normal. Since the master device is connected with and communicates with only the second slave device, whether the second slave device fails or not can be determined according to the communication condition with the second slave device, and therefore accurate positioning of the failed device on the bus topology can be achieved.

Description

Fault positioning system, method and computing device
Technical Field
The embodiment of the application relates to the technical field of fault location, in particular to a fault location system, a fault location method and a computing device.
Background
At present, many hardware products, such as servers, PCs, etc., generally use buses to communicate with different Field Replaceable Unit (FRU) components, and implement actions such as temperature detection, single board information collection, simple control information transmission, etc. by using different FRU components through a bus type topology structure. The FRU component refers to a module that a user can perform replacement operation on the site in the product maintenance stage, and may be, for example, a power module, a fan module, a node server, an exchange module, a management module, a chassis data module, and the like.
However, when one FRU component of a plurality of FRU components on a bus fails, the entire bus topology may be in an abnormal communication state, and normal communication may not be performed, which may be generally referred to as a suspended state of the bus topology. Because the FRU component with the fault on the bus topology cannot be directly positioned, a user often selects a mode of replacing all FRU components at one time, so that the normal communication of the bus topology is recovered in a short time, and the maintenance cost of the bus topology is generally higher.
Disclosure of Invention
The embodiment of the application provides a fault positioning system, a fault positioning method and a computing device, which can quickly position a device with a fault on a bus topology in a short time, so that normal communication of the bus topology can be recovered only by replacing the device with the fault, and the maintenance cost of the bus topology is reduced.
In a first aspect, an embodiment of the present application provides a fault location system, which may include a master device, a first slave device, a second slave device, a first isolation device, and a second isolation device, where the master device is connected to the first slave device and the second slave device through a bus, respectively, and the first isolation device may be used to control on/off of a link between the master device and the first slave device, and the second isolation device may be used to control on/off of a link between the master device and the second slave device. When the master device performs fault location, the first isolation device may be controlled to disconnect the master device from the first slave device, and the second isolation device may be controlled to connect the master device with the second slave device. In this way, the master device may have a connection based on the second slave device at the same time and may communicate with the second slave device based on the connection, thereby determining whether the communication between the master device and the second slave device is normal. It is understood that when the communication between the master device and the second slave device is abnormal, it may be determined that the second slave device has a transmission failure, and when the communication between the master device and the second slave device is normal, it may be determined that the second slave device is normal, and accordingly, the remaining slave devices that may be hung on the bus have a failure, for example, it may be determined that the first slave device has a failure, and the like. The master device is a device that obtains the control right of the bus, and the slave device is a device that is hung on the bus and accessed by the master device, and may be, for example, an FRU component such as a power module, a fan module, a node server, a switch module, a management module, and a chassis data module, or may be another component.
In this embodiment, since the master device only communicates with the second slave device through the isolation device, it can be determined whether the second slave device fails according to the communication condition with the second slave device, and when the second slave device does not fail, it can be determined that the first slave device fails, so that accurate positioning of the failed device on the bus topology can be achieved. In addition, the efficiency of locating the fault equipment is higher than the efficiency of a maintainer for checking the fault equipment one by one, so that the maintainer can restore the normal communication of the bus topology in a short time only by replacing the first slave equipment or the second slave equipment with faults without replacing all the equipment on the bus, and the maintenance cost of the bus topology can be effectively reduced.
In a possible implementation, the master device is further configured to control the first isolation device to connect the master device and the first slave device, and when it is determined that communication between the master device and the second slave device is abnormal, control the second isolation device to disconnect the second slave device from the master device, and determine whether communication between the master device and the first slave device is normal. In this embodiment, the master device may also be a control isolation device to check whether the first slave device fails, so that the accuracy of determining whether the first slave device fails can be improved.
In another possible implementation, the master device is further configured to mark the second slave device as abnormal when determining that the communication between the master device and the second slave device is abnormal, and likewise, the master device marks the first slave device as abnormal when determining that the communication between the master device and the first slave device is abnormal. In this way, the master device can distinguish the first slave device or the second slave device having the fault from the plurality of devices attached to the bus according to the flag. For example, the slave device is identified as abnormal, and specifically, the identification of the failed device may be recorded, or an abnormal identification is added to the failed slave device, and the like.
In another possible implementation, the master device is further configured to perform a fault alarm for the second slave device when determining that the communication between the master device and the second slave device is abnormal, or perform a fault alarm for the first slave device when determining that the communication between the master device and the first slave device is abnormal. For example, the master device may report fault alarm information corresponding to the first slave device or the second slave device to the upper management device to notify the upper management device; alternatively, the master device may also alarm a fault to the user through an indicator/buzzer corresponding to the first slave device or an indicator/buzzer corresponding to the second slave device, for example, when the first slave device fails, the indicator corresponding to the first slave device is turned on, or the buzzer sounds an alarm. Therefore, when the operation and maintenance personnel determine that the master equipment sends out fault alarm aiming at some slave equipment, the operation and maintenance personnel can replace the slave equipment.
In another possible implementation, the master device is further configured to perform a fault alarm for the master device when it is determined that the communication between the master device and the first slave device and the communication between the master device and the second slave device are both abnormal. When the master device communicates with all the slave devices independently, the communication abnormality exists, and the abnormality may occur in the master device, but not in all the slave devices, and at this time, the master device may perform a fault alarm for its own device.
In another possible implementation, the master device may control the first isolation device to disconnect the master device from the first slave device and control the second isolation device to connect the master device with the second slave device when the master device has a communication fault during the reset processing for the preset number of consecutive times. Therefore, the bus topology abnormity misjudgment of the main equipment due to program operation errors can be avoided as much as possible, and unnecessary fault positioning process is executed; or, the bus topology generates transient communication abnormality due to an interference signal generated by a new slave device in the process of accessing the bus topology, and the master device can be prevented from executing an unnecessary fault positioning process due to the transient communication abnormality through one or more times of reset processing.
In another possible embodiment, the first isolation device and the second isolation device may be specifically circuits including metal-oxide semiconductor field effect transistors (which may be abbreviated as MOS transistors) or bipolar transistors (which may be abbreviated as BJTs). Alternatively, the isolation device may be a circuit including a discrete circuit or a switch chip, and the like, and the connection on-off control between the master device and the slave device is realized.
In another possible embodiment, the first isolation device and the second isolation device may specifically be MOS transistors, and then drains of the first isolation device and the second isolation device are respectively connected to the master device, while a source of the first isolation device may be connected to the first slave device, and a source of the second isolation device may be connected to the second slave device. In this way, the master device may control the potential of the gate of the first isolation device to be a first preset potential (for example, a high potential) so as to disconnect the master device from the first slave device; and controlling the potential of the gate of the second isolation device to be a second preset potential (for example, a low potential) so that the master device is connected with the second slave device. In this way, the connection or disconnection of the link where the isolation device is located can be controlled by controlling the connection or disconnection between the source and the drain of the isolation device.
In another possible implementation, the system may include more than three (including three) slave devices and. Taking the system may further include a third slave device and a third isolation device as an example, and the third slave device may be connected to the master device through a bus, in a process of detecting whether the first slave device fails, the master device may control the third isolation device to disconnect the connection between the master device and the third slave device while controlling the first isolation device to disconnect the connection between the master device and the first slave device; similarly, in the process of detecting whether the second slave device fails, the master device may control the second isolation device to disconnect the master device from the second slave device, and at the same time, control the third isolation device to disconnect the master device from the third slave device; and in the process that the master device detects whether the third slave device fails, the third isolation device can be controlled to connect the master device and the third slave device, the first isolation device is controlled to disconnect the master device from the first slave device, and the second isolation device is controlled to disconnect the master device from the second slave device.
In a second aspect, an embodiment of the present application further provides a fault location method, where the fault location method is applied to a master device, and the master device is connected to a first slave device and a second slave device through a bus, respectively, and the method includes: the master device controls the first isolation device to disconnect the connection between the master device and the first slave device, and controls the second isolation device to connect the master device and the second slave device; the master device determines whether communication between the master device and the second slave device is normal.
In one possible embodiment, the method further comprises: the master device controls the first isolation device to connect the master device and the first slave device, and controls the second isolation device to disconnect the second slave device from the master device when determining that the communication between the master device and the second slave device is abnormal; the master device determines whether communication between the master device and the first slave device is normal in a case where the master device is connected to the first slave device.
In another possible embodiment, the method further comprises: and when the master device determines that the communication between the master device and the second slave device is abnormal, marking the second slave device as abnormal.
In another possible embodiment, the method further comprises: and when the communication between the master device and the second slave device is determined to be abnormal, the master device carries out fault warning on the second slave device.
In one possible embodiment, the method further comprises: and when the master device determines that the communication between the master device and the first slave device and the communication between the master device and the second slave device are abnormal, the master device performs fault warning on the master device.
In another possible implementation, the controlling, by the master device, the first isolation device to disconnect the master device from the first slave device, and controlling, by the master device, the second isolation device to connect the master device with the second slave device, and determining whether communication between the master device and the second slave device is normal includes: when communication faults exist in the reset processing process of the main equipment for continuous preset times, the main equipment controls the first isolation equipment to disconnect the main equipment from the first slave equipment, controls the second isolation equipment to connect the main equipment with the second slave equipment, and determines whether the communication between the main equipment and the second slave equipment is normal.
In another possible embodiment, the first isolation device and the second isolation device comprise a circuit with a metal-oxide semiconductor field effect transistor MOSFET or a bipolar transistor.
In another possible embodiment, the first isolation device and the second isolation device are MOSFETs, drains of the first isolation device and the second isolation device are respectively connected to the master device, a source of the first isolation device is connected to the first slave device, and a source of the second isolation device is connected to the second slave device; the master device controls the first isolation device to disconnect the connection between the master device and the first slave device, and controls the second isolation device to connect the master device and the second slave device, including: the main device controls the potential of the grid of the first isolation device to be a first preset potential so as to disconnect the main device from the first slave device, and controls the potential of the grid of the second isolation device to be a second preset potential so as to connect the main device with the second slave device.
In another possible embodiment, the master device is further connected to a third slave device via a bus, and the method further includes: the master device controls the first isolation device to disconnect the connection between the master device and the first slave device, controls the second isolation device to disconnect the connection between the master device and the second slave device, and controls the third isolation device to connect the master device and the third slave device.
In a third aspect, based on the same inventive concept as the method embodiment of the second aspect, an embodiment of the present application provides a computing apparatus, where the computing apparatus is applied to the host device described in the second aspect, that is, the computing apparatus may be a host device, or may be a chip or a processor that can be applied to the host device. The computing device has functionality to implement embodiments of the second aspect described above. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a fourth aspect, an embodiment of the present application provides an apparatus, including: a processor and a memory; the memory is configured to store instructions, and when the apparatus is running, the processor executes the instructions stored in the memory, so as to cause the apparatus to perform the fault location method in the second aspect or any implementation method of the second aspect. It should be noted that the memory may be integrated into the processor or may be independent from the processor. The apparatus may also include a bus. Wherein, the processor is connected with the memory through a bus. The memory may include a readable memory and a random access memory, among others.
In a fifth aspect, the present application further provides a readable storage medium, which stores a program or instructions, and when the readable storage medium is run on a computer, the method for fault location in any of the above aspects is executed.
In a sixth aspect, embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the fault location methods in the above aspects.
In addition, for technical effects brought by any one implementation manner of the second aspect to the sixth aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a schematic diagram of a bus topology;
FIG. 2 is a block diagram of a fault location system according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating that a MOS transistor is used to control the on/off of a link between a master device and a slave device in the embodiment of the present application;
fig. 4 is a schematic diagram illustrating that a master device controls on/off of a link between the master device and different slave devices by using a MOS tube in the embodiment of the present application;
FIG. 5 is a block diagram of another exemplary embodiment of a fault location system;
fig. 6 is a schematic flowchart of a fault location method in an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a computing apparatus applied to a host device in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an apparatus according to an embodiment of the present application.
Detailed Description
The bus topology is a topology formed by using a bus as a common data (including instructions and the like) transmission medium, and a plurality of node devices in a network are directly connected to the bus through corresponding hardware interfaces and cables. As shown in fig. 1, the devices 1 to 5 may be hung on a bus 100, and communication between different devices may be performed through the bus. Among them, the device capable of obtaining the bus control right may be referred to as a host device (host device), and the device selected by the host device and performing communication through the bus may be referred to as a slave device (slave device).
In practice, the device attached to the bus may be a FRU component and may be field replaceable by a user, for example, when the FRU component fails, the user may replace the failed FRU component in the field and attach a new FRU component to the bus. Of course, the device mounted on the bus may be another component, and the present application does not limit this.
However, when a hardware fault occurs in a device on the bus, for example, a pull-up resistor, a string resistor, a chip, etc. of the device occur, normal communication between the master device and the slave device on the entire bus topology may not be performed, and at this time, the bus topology may be referred to as a hang-up state. Taking the bus topology shown in fig. 1 as an example, assuming that the device 1 is a master device and the devices 2 to 4 are slave devices, the device 1 may transmit a control signal or a data signal to the device 5 through the bus to communicate with the device 5. At this time, if any device among the devices 2 to 4 fails, and the failed device may cause an abnormality in the hardware link impedance of the bus, an abnormal reflection may occur in the electrical signal transmitted on the bus, so that the transmission level of the electrical signal does not meet the specified requirement of the bus protocol, and thus the device 1 or 5 cannot successfully analyze the communication data from the received signal, and the communication between the devices is abnormal.
In this case, the communication management module on the master device can usually only detect the hardware faults of the devices on the bus, and cannot accurately locate which devices are in fault. Therefore, it is common for maintenance personnel to troubleshoot multiple devices on the bus one by one and replace the troubled devices identified by the troubleshooting in the field. However, the way of troubleshooting faulty devices one by maintenance personnel is time-consuming, and in some traffic scenarios, the bus topology may be required to recover communication in a short time to achieve fast recovery of traffic service. At this time, the maintenance personnel usually choose to replace all devices on the bus at one time to restore normal communication of the bus, which makes the maintenance cost of the bus topology usually high and easily reduces the customer satisfaction.
Therefore, the embodiment of the application provides a fault locating system, which is used for rapidly locating a device with a fault on a bus topology in a short time, so that normal communication of the bus topology can be recovered only by replacing the device with the fault, and the maintenance cost of the bus topology is reduced. Specifically, the system may include a master device, a plurality of isolation devices, and a plurality of slave devices, and here, for example, the system includes a first isolation device, a second isolation device, a first slave device, and a second slave device. The master device is respectively connected with the first slave device and the second slave device through buses. For example, a first isolation device may be connected in a link between a master device and a first slave device, and a second isolation device may be connected in a link between the master device and a second slave device. The master device may control the first isolation device to disconnect the master device from the first slave device and control the second isolation device to connect the master device with the second slave device, so that the master device may determine whether communication with the second slave device is normal based on the connection with the second slave device. It is understood that when an abnormality occurs in communication between the master device and the second slave device, it may be determined that the second slave device has failed. The master device only communicates with the second slave device, so that whether the second slave device fails or not can be determined according to the communication condition of the second slave device, when the second slave device fails, the first slave device can be determined to fail, accurate positioning of the failed device on the bus topology can be achieved, and the efficiency of positioning the failed device is higher than the efficiency of a maintenance person for checking the failed device one by one.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, various non-limiting embodiments accompanying the present application examples are described below with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 2, an architecture diagram of a fault location system in an embodiment of the present application is shown. As shown in fig. 2, the fault location system may include a master device, a first slave device, and a second slave device, and the master device is connected to the first slave device and the second slave device through a bus, respectively. The master device and the slave device are both devices mounted on the bus, the master device has the control right of the bus, the slave device can be accessed through the bus, and the slave device can be an FRU component mounted on the bus.
For example, the Bus in this embodiment may be a Bus capable of expanding multiple slave devices, such as an Integrated Circuit Bus (I2C), a System Management Bus (SMBus), a Serial Peripheral Interface (SPI) Bus, and the like. Of course, other types of buses are also possible, and this embodiment does not limit this.
In addition, the fault locating system further includes a first isolation device and a second isolation device, where the first isolation device may be connected to a link between the master device and the first slave device and controls on/off of the link between the master device and the first slave device, for example, the first isolation device may be connected in series to the link between the master device and the first slave device, and of course, the first isolation device may also be connected to the link in other connection manners on the basis of controlling on/off of the link between the master device and the first slave device, which is not limited in this embodiment; similarly, the second isolation device may be connected to a link between the master device and the second slave device and control the on/off of the link between the master device and the second slave device, for example, the second isolation device may be connected in series to the link between the master device and the second slave device, or the second isolation device may be connected to the link in another connection manner on the basis of controlling the on/off of the link between the master device and the second slave device, which is not limited in this embodiment. As an example, the first isolation device and the second isolation device may be a circuit having a Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET, abbreviated as a MOS Transistor) or a Bipolar Junction Transistor (BJT), or may be a circuit including a discrete circuit or a switch chip, and the like, and the connection between the master device and the slave device is controlled. The first isolation device and the second isolation device may be two independent devices, or may be integrated into one device, for example, the integrated device may include a plurality of sub devices, each sub device may be connected to a link between a master device and a slave device, and the links connected to different sub devices are different, so that the integrated device may control the on/off of the link between the master device and each slave device by controlling the on/off of the circuit or the level of the electric potential of each sub device through the chip. Of course, the specific expression of the first isolation device and the second isolation device in this embodiment is not limited.
For the sake of understanding, the principle of the isolation device controlling the connection and disconnection of the link between the master device and the slave device is exemplified by taking the isolation device as a circuit with a MOS transistor as an example. As shown in fig. 3, the MOS transistor may be connected in series to a link between the master device and the slave device, and the MOS transistor may include a drain (D terminal), a source (S terminal), and a gate (G terminal), and it is assumed that when the G terminal is at a low potential, the D terminal and the S terminal may be turned on, and at this time, the master device and the slave device are in a connected state, and when the G terminal is at a high potential, the D terminal and the S terminal may be turned off, and at this time, the master device and the slave device are in a disconnected state. Of course, this embodiment is only used as an exemplary illustration, and a specific implementation manner of controlling the on/off of the link between the master device and the slave device by using the MOS is not limited in this embodiment.
In this embodiment, the master device may control connection and disconnection between the master device and the plurality of slave devices by sending a control signal to the isolation device. Specifically, the master device may send a first control signal to the first isolation device, where the first control signal may be used to control the first isolation device to disconnect the master device from the first slave device, and meanwhile, the master device may send a second control signal to the second isolation device, where the second control signal may control the second isolation device to connect the master device with the second slave device. At this time, the master device may be connected only with the second slave device.
Taking the first isolation device and the second isolation device as an example, as shown in fig. 4, the fault location system at least includes a first MOS transistor and a second MOS transistor, which are respectively connected in series in the link between the master device and the first slave device and the link between the master device and the second slave device, and the D end and the S end of each MOS transistor are located in the connection link where the MOS transistor is located. Meanwhile, the main device is further configured with a general-purpose input/output (GPIO) interface, which may include at least two pins, namely pin 1 and pin 2, where pin 1 is connected to the G terminal of the first MOS transistor, and pin 2 is connected to the G terminal of the second MOS transistor. Then, the master device may output a first control signal with pin 1 at a first preset potential (e.g., a high potential) and a second control signal with pin 2 at a second preset potential (e.g., a low potential) by using the GPIO interface. Because the electric potential of the G end of the first MOS tube is consistent with the electric potential of the pin 1 and is the first preset electric potential, the D end and the S end of the first MOS tube are disconnected, and therefore the connection between the master device and the first slave device is disconnected. And the potential of the end G of the second MOS tube is consistent with the potential of the pin 2 and is a second preset potential, so that the end D and the end S of the second MOS tube are conducted, the main device and the second slave device are in a connection state, and therefore the main device can be only connected with the second slave device at the same time and disconnected with the rest slave devices.
It should be noted that, the implementation manner of using the GPIO interface and the MOS transistor to implement the link connection and disconnection between the master device and the slave device is only an exemplary illustration and is not limited to this, for example, in other possible implementations, the isolation device may be a device including a switch chip, and the master device may also implement the link connection and disconnection control between the master device and different slave devices by sending a link control instruction to the switch chip.
The master device may then communicate with the second slave device based on its connection with the second slave device. As an example, when the master device communicates with the second slave device, a reset process may be performed first, so that the master device returns to an initial state, so as to avoid as much as possible a communication abnormality between the master device and the second slave device caused by a program operation error on the master device, thereby causing the master device to misjudge that the second slave device has a fault. Then, the master device may send the service data or the test data to the second slave device based on the connection between the master device and the second slave device, and determine whether there is an abnormality in the communication between the master device and the second slave device according to the data transceiving situation.
When the communication between the master device and the second slave device is abnormal, the master device may determine that the second slave device may have a fault, so that the second slave device having the fault in the bus topology may be located, and when the communication between the master device and the second slave device is normal, the master device may determine that the second slave device has not the fault, and at this time, the master device may locate the slave device having the fault in the bus topology as the first slave device. Optionally, when the master device determines that the first slave device or the second slave device fails, the first slave device or the second slave device may be marked with an exception, for example, an identifier of the failed device may be recorded, or an exception identifier may be added to the failed slave device, so as to distinguish the failed first slave device or second slave device from the multiple devices hooked on the bus.
Further, when determining that the first slave device fails or the second slave device fails, the master device may perform a failure alarm for the first slave device or perform a failure alarm for the second slave device. For example, the master device may report fault alarm information corresponding to the first slave device or the second slave device to the upper management device to notify the upper management device; alternatively, the master device may also alarm a fault to the user through an indicator/buzzer corresponding to the first slave device or an indicator/buzzer corresponding to the second slave device, for example, when the first slave device fails, the indicator corresponding to the first slave device is turned on, or the buzzer sounds an alarm.
In some practical scenarios, the master device may also misjudge that the bus topology is in a hang-up state. For example, when a master device may generate an erroneous judgment of a bus topology abnormality due to a program operation error, or when a new slave device accesses the bus topology through a hardware interface, an interference signal may be generated in an access process to affect the quality of a signal transmitted on a bus, which may cause a communication abnormality of the bus topology in a short time, and when the slave device successfully accesses the bus topology, the communication of the bus topology is recovered to be normal. Therefore, in some possible embodiments, when the master device detects that the bus topology has communication abnormality, the master device may perform one or more times of reset processing, and determine that the bus topology has communication abnormality only when the master device has communication failure in the reset processing for a preset number of consecutive times, and then the master device locates the faulty device through the isolation device.
In this embodiment, the master device only communicates with the second slave device through the isolation device, so that whether the second slave device fails can be determined according to the communication condition with the second slave device, and when the second slave device fails, it can be determined that the first slave device fails, so that the accurate positioning of the failed device on the bus topology can be realized, and the efficiency of positioning the failed device is higher than the efficiency of a maintainer in checking the failed device one by one.
In the above embodiment, the master device may determine whether the second slave device fails by controlling the isolation device to disconnect the master device from the first slave device and connect the master device with the second slave device, and further determine that the first slave device fails in a case where it is determined that the second slave device does not fail, so as to achieve accurate location of the failed device. In a further possible embodiment, the master device may also be a control isolation device to check whether the first slave device fails, so as to improve the accuracy of determining whether the first slave device fails.
In particular, the master device may control the first isolation device to connect the master device with the first slave device, such that the master device may connect with the first slave device to determine whether the first slave device fails based on the connection. It is noted that when the communication between the master device and the second slave device is determined to be abnormal, the second isolation device is controlled to disconnect the master device from the second slave device, so as to avoid that the fault of the second slave device causes the master device to make an erroneous judgment on whether the first slave device is faulty or not. Still taking fig. 4 as an example, if the isolation device is specifically an MOS transistor, the master device may output a third control signal with a low potential at pin 1 by using the GPIO interface; the potential of the G end of the first MOS tube is consistent with that of the pin 1 and is a low potential, and the D end of the first MOS tube is conducted with the S end under the action of the low potential of the G end, so that connection of the main device and the first slave device is achieved. Meanwhile, if the master device determines that the second slave device fails before, the master device may output a fourth control signal with the pin 2 as a high potential by using the GPIO interface, the potential of the G terminal of the second MOS transistor is consistent with the potential of the pin 2 and is a high potential, and the D terminal and the S terminal of the second MOS transistor are disconnected under the action of the high potential of the G terminal, so that the master device and the second slave device are disconnected. Of course, in other examples, if the master device determines that the second slave device does not malfunction, the master device may control the second isolation device to disconnect the connection between the master device and the second slave device, or may maintain the connection between the master device and the second slave device, and at this time, the master device may output the control signal with the high potential at the pin 2 or may output the control signal with the low potential at the pin 2 by using the GPIO interface.
The master device may then communicate with the first slave device based on the connection with the first slave device. The process of communication between the master device and the first slave device is similar to the process of communication between the master device and the second slave device, and the master device can send service data or test data to the first slave device and determine whether the first slave device fails according to the data receiving and sending conditions; or, after the reset process, the master device may send service data or test data to the first slave device, and determine whether the first slave device fails according to the data transceiving condition.
When the master device determines that the communication with the first slave device is normal, the master device may determine that the first slave device has not failed; when the communication between the master device and the first slave device is abnormal, the master device may determine that the first slave device may have a fault, and further, at this time, the master device may further perform an abnormal marking on the first slave device, for example, an identifier of the faulty device may be recorded, so as to distinguish the faulty first slave device from the multiple devices attached to the bus.
In a further embodiment, when it is determined that the communication between the master device and the first slave device is abnormal and the communication between the master device and the second slave device is abnormal, it may also be that the communication between the master device and the plurality of slave devices is abnormal because the master device fails, and at this time, the master device may also perform fault alarm for the master device. For example, the alarm information that the main device has a fault is reported to the upper management device, or the fault alarm lamp is performed through an indicator lamp/buzzer.
Of course, all of the slaves may fail, in addition to possibly the master. Therefore, in one example, after the master device determines that all slave devices have failed, the master device may connect with the trusted self-test device and check whether the communication with the self-test device is normal. It can be understood that when the communication between the master device and the self-checking device is normal, the master device is characterized that the master device has not failed, and at this time, the master device may determine that all slave devices hung on the bus have failed. And when the communication between the main equipment and the self-checking equipment is abnormal, the main equipment is characterized to have a fault, and at the moment, the main equipment can determine that the main equipment has the fault and can further give a fault alarm.
It should be noted that, in the above two embodiments, the example that the fault location system includes two slave devices is taken as an example, and in other embodiments, the fault location system may further include more than three (including three) slave devices. In the following, taking the example that the fault location system includes three slave devices (i.e., the first slave device, the second slave device, and the third slave device), the fault location system realizes accurate location of the fault device.
Referring to fig. 5, a schematic structural diagram of another fault location system in the embodiment of the present application is shown. In this embodiment, the fault locating system includes a master device, a first isolation device, a second isolation device, a third isolation device, a first slave device, a second slave device, and a third slave device. As shown in fig. 5, the master device may be connected to the first slave device, the second slave device, and the third slave device through a bus, and each isolation device is connected to a link between the master device and the slave device, for example, the isolation device may be connected in series to a link between the master device and the slave device.
When the master device determines that the bus topology is abnormal in communication, the master device may send a control signal to each isolation device to control the on/off of links between the master device and the multiple slave devices, specifically control the first isolation device to disconnect the master device from the first slave device, control the third isolation device to disconnect the master device from the third slave device, and control the second isolation device to connect the master device with the second slave device, so that the master device is connected with only the second slave device at the same time. The main device utilizes the isolation device to realize the connection and disconnection between the main device and different slave devices, which can refer to the above-mentioned relevant points and is not described herein again.
Then, the master device may communicate with the second slave device based on the connection between the master device and the second slave device, and determine whether or not an abnormality occurs in the communication between the master device and the second slave device. For example, the master device may send service data or test data to the second slave device, and determine whether the communication between the master device and the second slave device is abnormal according to the data transceiving condition between the master device and the second slave device. When the communication between the master device and the second slave device is abnormal, the master device can determine that the second slave device may have a fault, so that the positioning of the fault device can be realized. And as for the first slave device and the third slave device, a fault may occur or no fault may occur, so that the master device may further perform fault detection and location on the first slave device and the third slave device.
Specifically, the master device may continue to send the control signal to each isolation device to control the first isolation device to connect the master device and the first slave device, control the second isolation device to disconnect the master device from the second slave device, and control the third isolation device to disconnect the master device from the second slave device, so that the master device is connected to only the first slave device at the same time. Of course, in other possible embodiments, when it is determined that the second slave device has not failed, the master device may also control the second device to connect the master device and the second slave device, in this case, the master device may be connected to the first slave device and the second slave device respectively at the same time, which is not limited in this embodiment. However, when it is determined that the second slave device fails, in order to prevent the failed second slave device from affecting the fault detection of the other slave devices, the master device may control the second isolation device to disconnect the connection between the master device and the second slave device.
Then, the master device may communicate with the first slave device based on the connection between the master device and the first slave device, and determine whether the communication between the master device and the first slave device is abnormal. When the communication between the master device and the second slave device is normal, it may be determined that the failed device occurring in the bus topology is the remaining third slave device. When the communication between the master device and the first slave device is abnormal, the master device can determine that the first slave device fails, so that the accurate positioning of the failed device is realized.
At the same time, the third slave device may also be a malfunctioning device. Therefore, in a further possible embodiment, the master device may further continue to detect whether the third slave device fails, so as to control the first isolation device to disconnect the master device from the first slave device, control the second isolation device to disconnect the master device from the second slave device, and control the third isolation device to connect the master device with the second slave device, so that the master device is connected with only the third slave device at the same time. Of course, if the master device determines that the communication between the master device and the first slave device is normal, the connection between the master device and the first slave device may be disconnected, or the connection state may be maintained, which is not limited in this embodiment; and if the master device determines that the communication between the master device and the first slave device is abnormal, the connection between the master device and the first slave device can be disconnected, so that the fault detection of the third slave device is prevented from being influenced by the failed first slave device. Similarly, regarding the second slave device, if the master device determines that the communication between the master device and the second slave device is normal, the connection between the master device and the first slave device may be disconnected, or the connection state may be maintained, which is not limited in this embodiment; and if the master device determines that the communication between the master device and the second slave device is abnormal, the connection between the master device and the second slave device can be disconnected, so that the fault detection of the third slave device is prevented from being influenced by the failed second slave device.
Then, the master device may communicate with the third slave device based on the connection between the master device and the third slave device, and determine whether or not the communication between the master device and the third slave device is abnormal. When the communication between the master device and the third slave device is abnormal, the master device may determine that the third slave device may have a fault, and may locate the faulty third slave device.
Optionally, when the master device determines that all of the first slave device, the second slave device, and the third slave device have a fault, it may also be that the master device has a fault, so that an abnormality occurs in communication between the master device and each slave device, and at this time, the master device may determine that its own device has a fault.
Further, when a faulty device (including a slave device or a master device) is located, the master device may perform a fault alarm for the faulty device, such as sending a fault alarm message to an upper management device, or notifying a maintenance worker of the currently faulty device by means of an indicator/buzzer.
It should be noted that, when the fault location system includes more slave devices, the master device realizes specific implementation of accurate location of the fault device in the bus topology, which may refer to the description of the relevant parts of the foregoing embodiments and is not described herein again.
As shown in fig. 6, which is a schematic flow chart of a fault location method in an embodiment of the present application, the method may be specifically applied to the master device in fig. 2 to 4, and the master device is connected to the first slave device and the second slave device through a bus respectively. When hardware failure occurs to a certain slave device, the whole bus topology may be in a hang-up state, and at this time, the master device may automatically perform fault detection to locate the slave device currently having the hardware failure. Illustratively, the method specifically includes:
s601: the master device controls the first isolation device to disconnect the connection between the master device and the first slave device, and controls the second isolation device to connect the master device and the second slave device.
S602: the master device determines whether communication between the master device and the second slave device is normal in a case where the master device is connected to the second slave device.
Since the master device and the plurality of slave devices are connected through the bus, if the bus topology has a failure such as hang-up, it is often difficult for the master device to directly determine which slave device or slave devices hung on the bus have the failure. For this reason, in this embodiment, the master device may establish a connection with only one slave device at the same time, so that the master device can determine whether the slave device fails by testing whether the communication between the master device and the slave device is normal.
In a specific implementation, the master device may send a first control signal to the first isolation device to control the first isolation device to disconnect the connection between the master device and the first slave device by using the first control signal, and meanwhile, the master device may also send a second control signal to the second isolation device to control the first isolation device to disconnect the connection between the master device and the first slave device by using the second control signal. In this way, at the same time, the master device may only have a connection with the second slave device, so that the master device may send the service data or the test data to the second slave device based on the connection, and determine whether the communication between the master device and the second slave device is abnormal based on the transceiving condition of the service data or the test data. When the communication is abnormal, which is likely to be the case when the communication between the master device and the second slave device is abnormal because the second slave master device has failed, the master device may determine that the second slave device has failed. And when the communication is normal, the second slave device is not failed, and at the moment, the failed device which causes the bus topology to be in a hang-up state is probably the first slave device.
For example, when the first isolation device and the second isolation device are both MOS transistors, the drains of the first isolation device and the second isolation device are respectively connected to the master device, the source of the first isolation device is connected to the first slave device, and the source of the second isolation device is connected to the second slave device. In this way, when the master device controls the first isolation device to disconnect the connection between the master device and the first slave device, specifically, the potential of the gate of the first isolation device may be controlled to be a first preset potential, so that the drain and the source of the first isolation device are disconnected under the effect that the potential of the gate of the first isolation device is the first preset potential, thereby realizing the disconnection between the master device and the first slave device. When the master device controls the second isolation device to connect the master device and the second slave device, specifically, the potential of the gate of the second isolation device may be controlled to be a second preset potential, so that the drain and the source of the second isolation device are turned on under the effect that the potential of the gate of the second isolation device is the second preset potential, thereby implementing connection between the master device and the second slave device.
Therefore, the master device can accurately position the fault slave device, the efficiency of positioning the fault slave device is higher than the efficiency of a maintainer for checking the fault device one by one, and the maintainer can only need to replace the first slave device or the second slave device which has faults, so that the normal communication of the bus topology can be recovered in a short time without replacing all devices on the bus, and the maintenance cost of the bus topology can be effectively reduced.
In some practical scenarios, the master device may also misjudge that the bus topology is in a hang-up state. For example, when a master device may generate an erroneous judgment of a bus topology abnormality due to a program operation error, or when a new slave device accesses the bus topology through a hardware interface, an interference signal may be generated in an access process to affect the quality of a signal transmitted on a bus, which may cause a communication abnormality of the bus topology in a short time, and when the slave device successfully accesses the bus topology, the communication of the bus topology is recovered to be normal. Therefore, in some possible embodiments, when the master device detects that the bus topology has communication abnormality, the master device may perform one or more times of reset processing, and determine that the bus topology has communication abnormality only when the master device has communication failure in the reset processing for a preset number of consecutive times, and then the master device locates the faulty device through the isolation device.
In this embodiment, the master device may determine that the first slave device fails according to the fact that the bus topology has a communication abnormality and the second slave device does not fail, but in a further possible implementation, the master device may further check whether the first slave device fails by testing whether the communication is normal.
Illustratively, the method may further comprise:
s603: the master device controls the first isolation device to connect the master device and the first slave device, and controls the second isolation device to disconnect the master device and the second slave device.
S604: the master device determines whether communication between the master device and the second slave device is normal in a case where the master device is connected to the first slave device.
In this embodiment, the master device may send a third control signal to the first isolation device, and control the first isolation device to connect the master device and the first slave device based on the third control signal; meanwhile, the master device may send a fourth control signal to the second isolation device, and control the second isolation device to disconnect the master device from the second slave device based on the fourth control signal. In this way, the master device can be connected only to the first slave device at the same time; then, the master device may determine whether communication between the master device and the first slave device is normal by transmitting the traffic data and the test data to the first slave device, so that it may be determined whether the first slave device fails.
Of course, in other possible embodiments, if the master device determines that the second slave device has not failed, even if the master device is connected to the second slave device, it will not generally affect the communication between the master device and the first slave device, so in one example, when the master device determines that the second slave device has not failed, the master device may control the second isolation device to maintain the connection between the master device and the second slave device in the process of verifying whether the first slave device has failed. Of course, when the master device determines that the communication between the master device and the second slave device is abnormal, in order to avoid that the fault of the second slave device causes the master device to make an erroneous judgment on whether the first slave device is faulty or not, the master device may control the second isolation device to disconnect the connection between the master device and the second slave device.
Optionally, when the master device determines that the first slave device or the second slave device fails, the master device may perform an exception marking on the first slave device or the second slave device, for example, may record an identifier of the failed device, so as to distinguish the failed first slave device or the second slave device from the multiple devices hooked on the bus.
Further, when determining that the first slave device fails or the second slave device fails, the master device may perform a failure alarm for the first slave device or perform a failure alarm for the second slave device. For example, the master device may report fault alarm information corresponding to the first slave device or the second slave device to the upper management device to notify the upper management device; alternatively, the master device may also alarm a fault to the user through an indicator/buzzer corresponding to the first slave device or an indicator/buzzer corresponding to the second slave device, for example, when the first slave device fails, the indicator corresponding to the first slave device is turned on, or the buzzer sounds an alarm.
In a further embodiment, when it is determined that the communication between the master device and the first slave device is abnormal and the communication between the master device and the second slave device is abnormal, it may also be that the communication between the master device and the plurality of slave devices is abnormal because the master device fails, and at this time, the master device may also perform fault alarm for the master device.
As an example, the isolation devices (including the first isolation device and the second isolation device) in this embodiment may be circuits including MOS transistors or BJT transistors, or may be circuits including discrete circuits or circuits including a switch chip, and the like, so as to implement on-off control of connection between the master device and the slave device. The master device in this embodiment may be a device that is attached to the bus and has a bus control right; the slave device in this embodiment may be a device that is attached to the bus and is accessed by the master device, and may be, for example, an FRU component.
It should be noted that, in the present embodiment, a master device and two slave devices are taken as an example for illustration. In other possible embodiments, more than three (including three) slave devices may be hooked on the bus. Taking the example that the master device and the three slave devices are connected through a bus, the master device may control the on/off of a link between the master device and the third slave device by using the third isolation device.
When the master device detects the fault of the third slave device, the master device connects the master device with the third slave device by using the third isolation device, and simultaneously disconnects the master device with the first slave device and the second slave device, so that the master device is only connected with the third slave device at the same time, and detects the fault of the third slave device based on the connection. Similarly, when the master device performs fault detection on the first slave device or the second slave device, the master device may disconnect the connection between the master device and the other slave devices by controlling the corresponding isolation device, and connect with only one detected slave device, so as to perform fault detection on the slave device.
It is noted that, for a slave device which has been determined by the master device to be not faulty, it may maintain the connection between the master device and the other slave devices which are not faulty during the process of detecting whether the other slave devices are faulty by the subsequent master device; and for the slave equipment with the determined fault by the master equipment, the master equipment and the slave equipment with the fault are disconnected in the process of subsequently detecting whether other slave equipment has the fault.
In addition, the embodiment of the application also provides a computing device which can be applied to a master device, wherein the master device is respectively connected with a first slave device and a second slave device through a bus. The apparatus applied to the master device may implement the functions performed by the master device shown in fig. 2 to 6. Referring to fig. 7, an apparatus 700 may include:
a control module 701, configured to control the first isolation device to disconnect the master device from the first slave device, and control the second isolation device to connect the master device to the second slave device;
a determining module 702, configured to determine whether communication between the master device and the second slave device is normal.
In a possible implementation, the control module 701 is further configured to control the first isolation device to connect the master device and the first slave device, and control the second isolation device to disconnect the second slave device from the master device when it is determined that communication between the master device and the second slave device is abnormal;
the determining module 702 is further configured to determine whether communication between the master device and the first slave device is normal when the master device is connected to the first slave device.
In a possible implementation, the apparatus 700 further includes a marking module 703 for marking the second slave device as abnormal when determining that the communication between the master device and the second slave device is abnormal.
In a possible implementation, the apparatus 700 further includes a fault warning module 704, configured to perform fault warning for the second slave device when it is determined that communication between the master device and the second slave device is abnormal.
In a possible implementation manner, the fault warning module 704 is further configured to perform fault warning for the master device when it is determined that the communication between the master device and the first slave device and the communication between the master device and the second slave device are both abnormal.
In a possible implementation manner, the control module 701 is configured to control the first isolation device to disconnect the master device from the first slave device and control the second isolation device to connect the master device with the second slave device when there is a communication fault in the reset processing of the master device for a preset number of consecutive times.
In one possible embodiment, the first isolation device and the second isolation device comprise a circuit with a metal-oxide semiconductor field effect transistor MOSFET or a bipolar transistor.
In a possible embodiment, the first isolation device and the second isolation device are the MOSFETs, drains of the first isolation device and the second isolation device are respectively connected to the master device, a source of the first isolation device is connected to the first slave device, and a source of the second isolation device is connected to the second slave device;
the control module 701 is configured to control a potential of a gate of the first isolation device to be a first preset potential, so as to disconnect the master device from the first slave device, and control a potential of a gate of the second isolation device to be a second preset potential, so as to connect the master device to the second slave device.
In addition, another structure of a computing device applied to a host device is provided in the embodiments of the present application, as shown in fig. 8, a communication interface 810 and a processor 820 may be included in the computing device 800. Optionally, a memory 830 may also be included in the computing device 800. The memory 830 may be disposed inside the computing device or disposed outside the computing device. The actions performed by each of the masters in fig. 2-6 described above may be implemented by processor 820, for example. Processor 820 sends control signals and communication data through communication interface 810 and is used to implement any of the methods described in fig. 6 as being performed by the master device. In implementation, the steps of the process flow may implement the method executed by the master device in fig. 6 through instructions in the form of hardware integrated logic circuits or software in the processor 820. For brevity, no further description is provided herein. Program code executed by processor 820 to implement the above-described methods may be stored in memory 830. The memory 830 is coupled to the processor 820, such as coupled to the processor.
Some of the features of the embodiments of the present application may be performed/supported by the processor 820 executing program instructions or software code in the memory 830. The software components loaded on the memory 830 may be summarized functionally or logically, for example, the control module 701, the determination module 702, the tagging module 703, and the fault alert module 704 shown in fig. 7.
Any of the communication interfaces involved in the embodiments of the present application may be a circuit, a bus, a transceiver, or any other device that can be used for information interaction. Such as the communication interface 810 in the computing device 800, which may illustratively be a device connected to the computing device, such as a slave device.
The processors referred to in the embodiments of the present application may be general purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like that implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, modules or modules, and may be an electrical, mechanical or other form for information interaction between the devices, modules or modules.
The processor may cooperate with the memory. The memory may be a nonvolatile memory, such as a Hard Disk Drive (HDD) or a solid-state drive (SSD), and may also be a volatile memory, such as a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such.
The embodiment of the present application does not limit the specific connection medium among the communication interface, the processor, and the memory. Such as memory, processor, and communication interfaces may be connected by a bus. The bus may be divided into an address bus, a data bus, a control bus, etc. Of course, the connection bus between the processor and the memory is not the connection bus between the aforementioned master and slave devices.
Based on the above embodiments, the present application further provides a computer storage medium, where a software program is stored, and when the software program is read and executed by one or more processors, the software program may implement the method performed by the master device provided in any one or more of the above embodiments. The computer storage medium may include: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
Based on the foregoing embodiments, an embodiment of the present application further provides a chip, where the chip includes a processor, and is configured to implement the functions of the master device according to the foregoing embodiments, for example, to implement the method executed by the master device in fig. 6. Optionally, the chip further comprises a memory for the processor to execute the necessary program instructions and data. The chip may be constituted by a chip, or may include a chip and other discrete devices.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (17)

1. The fault positioning system is characterized by comprising a main device, a first slave device, a second slave device, a first isolation device and a second isolation device, wherein the main device is respectively connected with the first slave device and the second slave device through buses;
the master device is configured to control the first isolation device to disconnect the connection between the master device and the first slave device, control the second isolation device to connect the master device and the second slave device, and determine whether communication between the master device and the second slave device is normal.
2. The system according to claim 1, wherein the master device is further configured to control the first isolation device to connect the master device and the first slave device, and to control the second isolation device to disconnect the second slave device from the master device and determine whether the communication between the master device and the first slave device is normal when it is determined that the communication between the master device and the second slave device is abnormal.
3. The system according to claim 1 or 2, wherein the master device is further configured to mark the second slave device as abnormal when determining that the communication between the master device and the second slave device is abnormal.
4. The system according to any one of claims 1 to 3, wherein the master device is further configured to perform a fault alarm for the second slave device when it is determined that the communication between the master device and the second slave device is abnormal.
5. The system according to any one of claims 2 or 4, wherein the master device is further configured to perform a fault alarm for the master device when it is determined that both the communication between the master device and the first slave device and the communication between the master device and the second slave device are abnormal.
6. The system according to any one of claims 1 to 5, wherein the master device is configured to control the first isolation device to disconnect the master device from the first slave device and control the second isolation device to connect the master device with the second slave device, and determine whether communication between the master device and the second slave device is normal, when there is a communication fault in each reset process of the master device for a preset number of consecutive times.
7. The system of any of claims 1 to 6, wherein the first isolation device and the second isolation device comprise circuits having metal-oxide semiconductor field effect transistors (MOSFETs) or bipolar transistors.
8. The system of claim 7, wherein the first isolation device and the second isolation device are the MOSFETs, wherein drains of the first isolation device and the second isolation device are respectively connected to the master device, wherein a source of the first isolation device is connected to the first slave device, and wherein a source of the second isolation device is connected to the second slave device;
the master device is configured to control a potential of a gate of the first isolation device to be a first preset potential so as to disconnect the master device from the first slave device, and control a potential of a gate of the second isolation device to be a second preset potential so as to connect the master device with the second slave device.
9. A fault locating method is applied to a master device, wherein the master device is respectively connected with a first slave device and a second slave device through buses, and the method comprises the following steps:
controlling the first isolation device to disconnect the connection between the master device and the first slave device, and controlling the second isolation device to connect the master device and the second slave device;
determining whether communication between the master device and the second slave device is normal.
10. The method of claim 9, further comprising:
controlling the first isolation device to connect the master device and the first slave device, and controlling the second isolation device to disconnect the second slave device from the master device when determining that the communication between the master device and the second slave device is abnormal;
determining whether communication between the master device and the first slave device is normal in a case of connection with the first slave device.
11. The method of claim 9 or 10, wherein the first isolation device and the second isolation device comprise circuits having metal-oxide semiconductor field effect transistors, MOSFETs, or bipolar transistors.
12. The method of claim 11, wherein the first isolation device and the second isolation device are the MOSFETs, wherein drains of the first isolation device and the second isolation device are respectively connected to the master device, wherein a source of the first isolation device is connected to the first slave device, and wherein a source of the second isolation device is connected to the second slave device;
the controlling the first isolation device to disconnect the master device from the first slave device and the controlling the second isolation device to connect the master device to the second slave device includes:
controlling the potential of the grid of the first isolation device to be a first preset potential so as to disconnect the main device from the first slave device, and controlling the potential of the grid of the second isolation device to be a second preset potential so as to connect the main device with the second slave device.
13. A computing apparatus, applied to a master device, the master device being respectively connected to a first slave device and a second slave device through a bus, the apparatus comprising:
the control module is used for controlling the first isolation device to disconnect the connection between the main device and the first slave device and controlling the second isolation device to connect the main device and the second slave device;
a determining module for determining whether communication between the master device and the second slave device is normal.
14. The apparatus according to claim 13, wherein the control module is further configured to control the first isolation device to connect the master device and the first slave device, and to control the second isolation device to disconnect the second slave device from the master device when it is determined that the communication between the master device and the second slave device is abnormal;
the determining module is further configured to determine whether communication between the master device and the first slave device is normal when the master device is connected to the first slave device.
15. The apparatus of claim 13 or 14, wherein the first isolation device and the second isolation device comprise circuits having metal-oxide semiconductor field effect transistors, MOSFETs, or bipolar transistors.
16. The apparatus of claim 15, wherein the first isolation device and the second isolation device are the MOSFETs, wherein drains of the first isolation device and the second isolation device are respectively connected to the master device, wherein a source of the first isolation device is connected to the first slave device, and wherein a source of the second isolation device is connected to the second slave device;
the control module is specifically configured to control a potential of a gate of the first isolation device to be a first preset potential, so that the connection between the master device and the first slave device is disconnected; and controlling the potential of the grid of the second isolation device to be a second preset potential so as to connect the master device with the second slave device.
17. A computing device, the device comprising a memory and a processor, the memory to store software instructions; the processor invokes the memory-stored software instructions to perform the method of any of the preceding claims 9 to 12.
CN202010656493.XA 2020-07-09 2020-07-09 Fault positioning system, method and computing device Pending CN113992501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010656493.XA CN113992501A (en) 2020-07-09 2020-07-09 Fault positioning system, method and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010656493.XA CN113992501A (en) 2020-07-09 2020-07-09 Fault positioning system, method and computing device

Publications (1)

Publication Number Publication Date
CN113992501A true CN113992501A (en) 2022-01-28

Family

ID=79731327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010656493.XA Pending CN113992501A (en) 2020-07-09 2020-07-09 Fault positioning system, method and computing device

Country Status (1)

Country Link
CN (1) CN113992501A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115664932A (en) * 2022-10-17 2023-01-31 厦门海辰储能科技股份有限公司 Energy block parallel communication method and device
WO2024087661A1 (en) * 2022-10-26 2024-05-02 华为技术有限公司 Fault location method, apparatus and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412531A (en) * 2013-07-30 2013-11-27 华为数字技术(苏州)有限公司 Bus control method and device
CN108073540A (en) * 2018-02-11 2018-05-25 云丁网络技术(北京)有限公司 I2C bus systems, warping apparatus investigation method
US20190272252A1 (en) * 2018-01-09 2019-09-05 Shenzhen GOODIX Technology Co., Ltd. Method of processing deadlock of i2c bus, electronic device and communication system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412531A (en) * 2013-07-30 2013-11-27 华为数字技术(苏州)有限公司 Bus control method and device
US20190272252A1 (en) * 2018-01-09 2019-09-05 Shenzhen GOODIX Technology Co., Ltd. Method of processing deadlock of i2c bus, electronic device and communication system
CN108073540A (en) * 2018-02-11 2018-05-25 云丁网络技术(北京)有限公司 I2C bus systems, warping apparatus investigation method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115664932A (en) * 2022-10-17 2023-01-31 厦门海辰储能科技股份有限公司 Energy block parallel communication method and device
CN115664932B (en) * 2022-10-17 2024-01-26 厦门海辰储能科技股份有限公司 Energy block parallel communication method and device
WO2024087661A1 (en) * 2022-10-26 2024-05-02 华为技术有限公司 Fault location method, apparatus and system

Similar Documents

Publication Publication Date Title
CN106055438B (en) The method and system of memory bar exception on a kind of quick positioning mainboard
WO2021027481A1 (en) Fault processing method, apparatus, computer device, storage medium and storage system
US8286034B2 (en) Accurate fault status tracking of variable access sensors
CN104639380A (en) Server monitoring method
TW201719436A (en) Method of detecting fault on communication bus using baseboard management controller and fault detector for network system
CN105183575A (en) Processor fault diagnosis method, device and system
CN113992501A (en) Fault positioning system, method and computing device
CN112783703A (en) SAS link fault positioning method, device, equipment and storage medium
US7953016B2 (en) Method and system for telecommunication apparatus fast fault notification
WO2024113818A1 (en) Switch reset system and method, non-volatile readable storage medium, and electronic device
TWI238933B (en) Computer system with dedicated system management buses
CN111176913A (en) Circuit and method for detecting Cable Port in server
CN112019455B (en) Switch monitoring device and method based on programmable logic device
CN113868058A (en) Peripheral component high-speed interconnection equipment fault detection method and device and server
WO2024113962A1 (en) Liquid leakage detection cable testing method, system, and apparatus, server, and electronic device
CN103580953A (en) Method and devices for detecting faults
CN116483613B (en) Processing method and device of fault memory bank, electronic equipment and storage medium
CN117527653A (en) Cluster heartbeat management method, system, equipment and medium
US7925728B2 (en) Facilitating detection of hardware service actions
CN116137603B (en) Link fault detection method and device, storage medium and electronic device
CN116382968A (en) Fault detection method and device for external equipment
CN114860494A (en) SAS expander configuration self-adaptive system
CN115543707A (en) Hard disk fault detection method, system and device, storage medium and electronic device
US20070180329A1 (en) Method of latent fault checking a management network
CN114064401A (en) Method and device for positioning hard disk fault, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220128