WO2020088351A1

WO2020088351A1 - Method for sending device information, computer device and distributed computer device system

Info

Publication number: WO2020088351A1
Application number: PCT/CN2019/113147
Authority: WO
Inventors: 岑月宁
Original assignee: 华为技术有限公司
Priority date: 2018-11-01
Filing date: 2019-10-25
Publication date: 2020-05-07
Also published as: CN109831350A

Abstract

Provided in the present application are a computer device, a distributed computer device system and a method for sending device information so as to solve the problem of real-time performance being poor when using heartbeat to detect failures in distributed nodes. The method provided in the present application comprises: acquiring information for resetting a computer device; according to the acquired information for resetting the computer device, generating a reset notification message which contains the resetting information; and sending the reset notification message to other devices in a distributed system so as to rapidly notify the other devices in the distributed system of the resetting information of the present device. Compared with the prior art, a manner of using heartbeat to detect whether other devices have been reset may improve efficiency in the transmission of resetting information, and may also prevent the occurrence of misjudgment.

Description

Device information sending method, computer device and distributed computer device system

This application requires the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on December 29, 2018 with the application number 201811632716.8 and the invention titled "Method for Sending Equipment Information, Computer Equipment and Distributed Computer Equipment System". The patent application requires the priority of the Chinese patent application submitted to the China Patent Office on November 01, 2018, with the application number 201811294576.8 and the invention titled "Method of sending device information, computer equipment and distributed computer equipment system", all of its content Incorporated by reference in this application.

Technical field

This application relates to the field of information technology, in particular to a method for sending device information, a computer device, and a distributed computer device system.

Background technique

Distributed systems usually include distributed computing systems and distributed storage systems.

A system composed of a group of computers using distributed computing is called a distributed computing system. The distributed computing system divides the project data that needs a large amount of calculation into small pieces, which are calculated by multiple computing nodes, such as a server with a computing function, and then the results of the calculation are unified and merged to obtain the data conclusion.

A distributed storage system is to store data on multiple independent devices in a distributed manner, adopt a scalable system structure, use multiple storage nodes, such as a storage server, to share storage load, and use a location server to locate and store information. Distributed storage system can not only improve the reliability, availability and access efficiency of the system, but also be easy to expand.

The computing points in a distributed computing system and the storage nodes in a distributed storage system are collectively called distributed nodes.

At present, the industry can only detect the reset failure or power failure of distributed nodes through the heartbeat between nodes. The detection of distributed node failures through heartbeat has the problems of misjudgment and poor real-time detection, which cannot meet the needs of business switching in high-end scenarios (banks, etc.).

Summary of the invention

Embodiments of the present application provide a computer device, a distributed computer device system, and a method for sending device information, to solve the problem of poor real-time performance of heartbeat detection of distributed node faults.

In a first aspect, an embodiment of the present application provides a computer device, the computer device is a computer device in a distributed system,

The computer device includes a processor, wherein the computer device further includes a message sending unit, and the message sending unit and the processor are connected by a bus;

The processor is configured to acquire the reset information of the computer device when the computer device is reset, and transmit the reset information of the computer device to the message sending unit;

The message sending unit is configured to generate a reset notification message including the reset information of the computer device based on the reset information of the computer device, and send the reset notification message to other users in the distributed system Computer equipment.

The above computer device can obtain the reset information of the device and generate a reset notification message containing the reset information according to the obtained reset information of the device, and send it to other devices in the distributed system to quickly reset the device Informs other devices in the distributed system. Compared with the prior art method of detecting whether other devices are reset by heartbeat, the efficiency of resetting information transmission can be improved. Further, since there is no need to set a preset threshold for heartbeat detection, the occurrence of misjudgment caused by improper threshold setting is avoided.

Optionally, the computer device may be a computing server device or a storage server device.

Optionally, the computer device may further include a main memory or an auxiliary memory.

Optionally, the computer device and other devices in the distributed system may communicate through networks such as Gigabit Ethernet (GE), IB (InfiniBand), and so on.

Optionally, the message sending unit may send the notification message through a private network or a public network.

Optionally, the message sending unit may send the notification message to other computer devices in the distributed system by means of directed messages. The message sending unit may also send the notification message by sending a broadcast message.

Optionally, the processor may be a central processing unit (CPU), and the CPU may be an X86 CPU or an advanced reduced instruction set computer (advanced reduced instruction set computing) (ARM).

Optionally, the message sending unit may be a PCIe (Peripheral Component Interconnect Express) intelligent network card. The CPU and the message sending unit may be connected through a PCIe bus.

In a possible implementation manner of the first aspect, the operating system in the computer device includes a notification chain about reset; before the computer device is reset, the processor obtains from the notification chain through a preset function Reset information of the computer device.

In a possible implementation manner of the first aspect,

The operating system in the computer device includes a reset detection module,

The reset detection module is registered on a reset notification chain in the operating system of the computer device, and obtains information about the reset of the computer device on the reset notification chain through a callback function;

The processor obtains reset information of the computer device through the reset detection module.

In a possible implementation manner of the first aspect,

The message sending unit includes a microcode module;

The microcode module is used to generate the reset notification message and send the reset notification message to other computer devices in the distributed system.

Optionally, the above microcode module may be implemented by firmware (firmware, FW).

Optionally, the reset detection module is a module in an operating system run by the computer device.

Optionally, the reset detection module sends the reset information of the computer device through a private protocol with the microcode module.

In a possible implementation manner of the first aspect, the computer device further includes a baseboard management controller (BMC); the processor transmits the reset information of the computer device to the computer through the BMC Describe the message sending unit.

In a possible implementation manner of the first aspect, the computer device further includes a power supply module, and the power supply module is connected to the message sending unit through a general-purpose input / output (GPIO) pin ;

The power supply module is used to transfer the power-off information of the computer device to the message sending unit through the transition of the trigger pin when the computer device is powered off;

The message sending unit is further configured to generate a power-down notification message containing the computer device's power-down information based on the computer device's power-down information, and send the power-down notification message to the distribution Other computer equipment in a distributed system. The above-mentioned computer device can generate a power-down notification message containing power-down information based on the obtained power-down information of the device, and send it to other devices in the distributed system, which can quickly notify the distributed system of the power-down information of the device In other devices. Compared with the prior art method of detecting whether other devices are powered off by heartbeat, the efficiency of power-off information transmission can be improved. Further, since there is no need to set a preset threshold for heartbeat detection, the occurrence of misjudgment caused by improper threshold setting is avoided.

Optionally, the power supply module may change the GPIO pin through a power-down instruction signal.

Optionally, the power-down indication signal may be a PS_OK signal, or other signals used to indicate the mains power-down.

In a possible implementation manner of the first aspect,

The message sending unit is a baseboard management controller BMC, and the BMC further includes a notification module;

The notification module generates the reset notification message according to the reset information of the computer device acquired by the BMC, and sends the reset notification message to the other computer devices in the distributed system.

Optionally, the notification module in the BMC sends the reset notification message to the BMC in the other computer equipment of the distributed system.

Optionally, the notification module in the BMC may send the reset notification message to the BMC in the other computer device through an out-of-band system.

In a possible implementation manner of the first aspect,

The computer equipment also includes a power supply module;

The power supply module is also used to transmit the power-off information of the computer device to the BMC by triggering the transition of the pin when the computer device is power-off;

The notification module is further configured to generate a power-down notification message including the computer device's power-down information according to the acquired power-down information of the computer device, and send the power-down notification message to the distribution Other computer equipment in a distributed system.

In a second aspect, an embodiment of the present application provides a distributed computer device system, including at least two computer devices in the first aspect.

Optionally, the distributed device system may be a distributed computing system, a distributed storage system, or a distributed hybrid system. Among them, the distributed hybrid system includes a system of computing devices and storage devices.

In a third aspect, an embodiment of the present application provides a method for sending device information. The method includes:

The processor in the computer device acquires the reset information of the computer device before resetting the computer device, and transmits the reset information of the computer device to the message sending unit in the computer device;

The message sending unit receives the reset information of the computer device;

The message sending unit generates a reset notification message containing the reset information of the computer device based on the reset information of the computer device, and sends the reset notification message to the distributed system where the computer device is located Other computer equipment.

The above method obtains the reset information of the device and generates a reset notification message containing the reset information according to the reset information of the device, and sends it to other devices in the distributed system to quickly reset the device. Notify other devices in the distributed system. Compared with the prior art method of detecting whether other devices are reset or powered off by heartbeat, the efficiency of resetting information transmission is improved. Further, since there is no need to set a preset threshold for heartbeat detection, the occurrence of misjudgment caused by improper threshold setting is avoided.

Optional,

In a possible implementation manner of the third aspect, the processor obtains reset information of the computer device through a preset function and a notification chain about reset in the operating system of the computer device.

In a possible implementation manner of the third aspect, the method further includes:

The preset function is a callback function, and the callback function is registered on the notification chain;

The processor acquiring the reset information of the computer device through a preset function and a notification chain about reset in the operating system of the computer device includes:

The processor obtains the reset information of the computer device from the notification chain through the callback function.

In a possible implementation manner of the third aspect, the processor transmits the reset information of the computer device to the message sending unit through the BMC in the computer device.

In a possible implementation manner of the third aspect, the message sending unit is a BMC in the computer device.

In a possible implementation manner of the third aspect, when the computer device is powered off, the message sending unit acquires the information about the power off of the computer device through a pin transition, and generates and includes the computer device A power failure notification message of power failure information, and sending the power failure notification message to the other computer equipment in the distributed system.

The above method generates a power-down notification message containing power-down information based on the obtained power-down information of the device, and sends it to other devices in the distributed system, which can quickly notify the device of the power-down information in the distributed system. Other equipment. Compared with the prior art method of detecting whether other devices are powered off by heartbeat, the efficiency of power-off information transmission can be improved. Further, since there is no need to set a preset threshold for heartbeat detection, the occurrence of misjudgment caused by improper threshold setting is avoided.

According to a fourth aspect, an embodiment of the present application provides a computer program product, the computer program product includes a computer program stored in a computer-readable storage medium, and the calculation program is loaded by a controller to implement the third aspect Or any possible implementation of the third aspect.

According to a fifth aspect, an embodiment of the present application provides a non-volatile computer-readable storage medium for storing a computer program that is loaded by a processor to perform the third aspect or any possibility of the third aspect Instructions for the method of implementation.

According to a sixth aspect, the embodiments of the present application provide a chip including programmable logic circuits and / or program instructions, which are used to implement the third aspect or any possible aspect of the third aspect when the chip is running The method of implementation.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions in the embodiments of the present invention, the drawings required in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.

1 is a schematic structural diagram of a distributed system provided by an embodiment of this application;

2 is a schematic structural diagram of an implementation manner of a distributed system provided by an embodiment of the present application;

3 is a schematic structural diagram of a specific implementation manner of the distributed system shown in FIG. 2;

4 is a schematic structural diagram of another specific implementation manner of the distributed system shown in FIG. 2;

5 is a schematic structural diagram of another implementation manner of a distributed system provided by an embodiment of the present application;

6 is a schematic structural diagram of another implementation manner of a distributed system provided by an embodiment of the present application;

7 is a schematic structural diagram of a specific implementation manner of the distributed system shown in FIG. 6;

8 is a schematic structural diagram of another specific implementation manner of the distributed system shown in FIG. 6;

9A is a schematic structural diagram of a computer device 900 according to an embodiment of the present application;

9B is a schematic structural diagram of another implementation manner of a computer device 900 provided by an embodiment of this application;

10 is a schematic structural diagram of another implementation manner of a computer device 900 provided by an embodiment of this application;

11 is a schematic structural diagram of another implementation manner of a computer device 900 provided by an embodiment of this application;

FIG. 12 is a schematic flowchart of a method for sending device information according to an embodiment of the present application.

detailed description

The following describes the embodiments of the present invention with reference to the drawings.

FIG. 1 is a schematic structural diagram of a distributed system including node 1, node 2 and node 3. The distributed system may be a distributed computing system or a distributed storage system. Correspondingly, node 1, node 2 and node 3 may be computing nodes or storage nodes. In the embodiment of the present application, the node may be a computer device, for example, a computing server or a storage server; the node may also be another device with electronic information processing capabilities, such as a device with information communication capabilities.

Optionally, the distributed system in the embodiment of the present application may also be a cluster system. The cluster system includes more than two nodes, and each node can run cluster management software to manage the nodes in the cluster. The cluster management software is software that manages the nodes in the distributed system, for example, it is used to report the status of each node in the distributed system and isolate the faulty node. The cluster management software can collect the heartbeat detection results of all nodes in the distributed system, comprehensively determine whether a node is faulty or abnormal, and whether service switching is required.

During the operation of a distributed system, it is usually determined by heartbeat detection whether a certain node is abnormal. For example, each node sends probe packets (such as ping packets) to other nodes, and determines whether a node is abnormal by the delay of the ping packet response. Taking the system shown in FIG. 1 as an example, node 2 sends a ping packet to node 1. The size of the packet sent can be 1024 bytes or 64 bytes. Node 2 detects the delay of node 1 responding to the ping packet. If the delay exceeds a preset threshold, for example, 2 seconds, then determine the number of packets exceeding the preset threshold in a preset period, and if the number of packets exceeding the preset threshold exceeds a preset threshold, for example 5, It is determined that node 1 is abnormal. Similarly, node 3 also determines whether node 1 is abnormal by sending a ping packet to node 1. If node 3 also determines that node 1 is abnormal, it is determined that node 1 is abnormal, and the heartbeat detection of node 1 is completed.

In Figure 1, if node 1 is reset, the heartbeat detection between node 2 and node 1 will be abnormal, and the heartbeat detection between node 3 and node 1 will also be abnormal. In this way, the cluster management software in the nodes 2 and 3 can determine the failure of the node 1 according to the result of the heartbeat detection, and perform management operations such as service switching. The above-mentioned method of heartbeat detection needs to send probe packets regularly for detection, and it takes a long period to judge whether there is an abnormality. Usually, the time required is 5.5 seconds or more (end-to-end service switching is 6 to 8 seconds). Such detection has poor real-time performance and cannot meet the needs of business switching in high-end scenarios (banks, etc.).

Embodiments of the present application provide a computer device, a distributed computer device system, and a method for sending device information, to solve the problem of poor real-time performance in the manner of detecting a distributed node failure through heartbeat.

FIG. 2 is a schematic structural diagram of an implementation manner of a distributed system provided by an embodiment of the present application. As shown in FIG. 2, the distributed system includes node 100, node 200, and node 300. The node 100 includes a control unit 101, an interface unit 102, and a power module 103; the node 200 includes a control unit 201, an interface unit 202, and a power module 203; the node 300 includes a control unit 301, an interface unit 302, and a power module 303. It can be understood that FIG. 2 is only for the convenience of describing the technical solution of the present application, and illustrates the number of nodes and the components included in the node. In specific implementation, more nodes may be included, or the node may also include other components. For example, the node may also include a main memory (such as a random access memory (RAM), etc.) and an auxiliary memory (such as a hard disk, etc.), which are not listed one by one.

The nodes shown in Figure 2 can communicate through networks such as GE or IB. The embodiments of the present application do not limit specific network protocols or network forms. Nodes can communicate through the interface unit, for example, between the node 100 and the node 200 through the interface unit 102 and the interface unit 202.

Taking the node 100 as an example, the control unit 101 may be a processor, for example, a CPU, including but not limited to X86 CPU or ARM, etc .; the control unit 101 may also be a heterogeneous processor, etc. The interface unit 102 may be a PCIe intelligent network card or other network card device. The power supply module 103 is a device that supplies power to the node 100, and may be a power supply that supplies power to the node 100. For example, the power module 103 may be a hardware module that converts the 220V voltage input from the mains into a 12V voltage that can be used by other components in the node 100. The control unit 101 and the interface unit 102 may be connected by a bus (for example, a PCIe bus), and the power module 103 may be connected to the GPIO pins of the interface unit 102. When the node 100 is reset, for example, when the node 100 is ready to be reset, the control unit 101 may send an instruction to the interface unit 102 to transfer the reset information of the node 100 to the interface unit 102. After the control unit 101 sends an instruction to the interface unit 102, the node 100 can initiate a reset.

It can be understood that the instruction sent by the control unit 101 to the interface unit 102 may be an instruction sent in the form of commands, messages, messages, or hardware signals. The embodiment of the present application does not limit the specific form in which the control unit 101 sends an instruction to the interface unit 102, as long as the control unit 101 can transfer the reset information of the node 100 to the interface unit 102, they are all within the scope covered by the embodiment of the present application.

Optionally, resetting the node 100 in the embodiment of the present application may be restarting of the node 100; correspondingly, the information of the node 100 resetting in the embodiment of the present application may be information of restarting the node 100.

The interface unit 102 sends the reset information of the node 100 to the node 200 and the node 300 through the network according to the received notification of the reset of the node 100. For example, the interface unit 102 may send a reset notification message including the identification of the node 100 and the information about the reset of the node 100 through the network. After the node 200 and the node 300 obtain the reset notification message sent by the node 100, they can perform corresponding management operations through the cluster management software. Taking node 200 as an example, after node 200 receives the reset notification message sent by node 100, the management module (such as cluster management software) in node 200 starts the corresponding processing, including but not limited to starting the process of isolating node 100, In order to avoid the failure caused by continuing to access the node 100.

Through the above method, when the node 100 is reset, the interface unit 102 in the node 100 responsible for sending the reset notification message acquires the reset information of the node 100, and sends the reset information of the node 100 to other nodes in the distributed system, enabling the node The node 200 and the node 300 quickly learn the reset information of the node 100. Compared with the way of heartbeat detection and notification, the efficiency of other nodes to obtain the reset information of the node 100 is improved, and the real-time performance is high.

In addition, since the interface unit 102 in the node 100 in this application directly obtains the reset information, it can avoid that the heartbeat detection method is judged by the preset threshold and threshold due to the unreasonable setting of the relevant threshold or threshold The resulting misjudgment can prevent nodes from being erroneously isolated.

When the node 100 is powered off, the power supply module 103 in the node 100 triggers a power-down instruction signal through the transition of the pin, and the power-down instruction signal can trigger the transition of the pin of the interface unit 102, for example, from a high level Change to low level, or jump from low to high level. The interface unit 102 can learn about the power down of the node 100 according to the transition of the pin.

The interface unit 102 may generate a power-down notification message including the power-down of the node 100 according to the power-down information of the node 100, and send the power-down notification message to other nodes in the distributed system, such as the node 200 and the node 300.

Since the power module 103 converts the voltage of the commercial power to a voltage that can be used by other components in the node 100, when the commercial power is powered off, the power module 103 triggers the jump of the pin when it senses the power is lost. After the mains power is lost, the power module 103 will continue to convert the mains power received before the mains power is turned into a voltage that can be used by the node 100. During this period, the node 100 can be quickly powered down through the jump of the pin The information is transmitted to the interface unit 102, and the interface unit 102 generates a power-down notification message for the node 100 to power off, and sends the power-down notification message to the node 200 and the node 300.

In this way, when the node 100 is powered off, the node 100 can notify the interface unit 102 through the power supply module, and the interface unit 102 can send the information of the node 100 to the other nodes in the distributed system, so that the node 200 and the node 300 can be quickly The information that the node 100 is powered off is learned. Compared with the way of heartbeat detection and notification, the power-down notification method of the present application improves the efficiency of notification and has high real-time performance; and, it can avoid that the heartbeat detection method is judged by the preset threshold and threshold due to related The misjudgment caused by the unreasonable setting of threshold or threshold value can prevent nodes from being erroneously isolated.

The method of resetting or powering off the transmission node of the distributed system shown in FIG. 2 is described in detail below through a specific example. In specific implementation, the control unit in each node of the distributed system shown in FIG. 2 may further include a reset detection module, and the interface unit may also include a microcode module. Wherein, the microcode module may be FW. As shown in FIG. 3, the control unit 101 includes a reset detection module 1011, the interface unit 102 includes a microcode module 1021; the control unit 201 includes a reset detection module 2011, and the interface unit 202 includes a microcode module 2021; the control unit 301 includes a reset detection module 3011 The interface unit 302 includes a microcode module 3021.

It can be understood that the above control unit includes a reset detection module, and specifically, the operating system running by the control unit includes a reset detection module. The embodiments of the present application and the drawings are concise and convenient. The operating system running by the control unit includes a reset detection module, and the control unit is described as including a reset detection module; the function implemented by the control unit by executing the code corresponding to the reset detection module, Describe the functions implemented by the reset detection module.

Taking the node 100 as an example, the reset detection module 1011 is used to detect whether the node 100 is reset, and before the node 100 is reset, send a notification message to the interface unit 102. The reset detection module 1012 may send a notification message to the microcode module 1021 in the interface unit 102 to transfer the reset information of the node 100 to the interface unit 102. After receiving the notification message sent by the reset detection module 1012, the microcode module 1021 sends the reset information of the node 100 to the node 200 and the node 300 through the network.

Taking the Linux operating system running on the node 100 and the software in the node 100 needing reset as an example, the manner in which the reset detection module 1011 obtains reset information and notifies the microcode module 1021 will be described.

The reset detection module 1011 can be registered on the notification chain provided by the Linux operating system. The notification chain is a notification mechanism provided by the Linux operating system. In the Linux operating system, it contains multiple kernel subsystems. Most kernel subsystems are independent of each other, and events in other subsystems can be obtained through notification chains between different subsystems. The notification chain can only be used between the subsystems of the kernel, not the notification of events between the kernel and user space. The notification chain is a function linked list, and each linked list node on the linked list has a function registered. When an event occurs, the functions corresponding to all linked list nodes on the linked list will be executed. Therefore, for a notification chain, there will be a notifier and a receiver. The receiver can register a function on the notification chain, and these functions will be executed when an event occurs. The receiver can define the corresponding processing function when the event occurs, namely the callback function. The callback function needs to be registered in the notification chain in advance. When an event occurs, after the notification party sends out a notification, the receiver can obtain the corresponding event through the callback function.

Before a piece of software running in the node 100 is reset, a notification message is sent to the notification chain, and the notification chain may be a notification chain about resetting. The reset detection module 1011 registered on the notification chain can obtain the information that the node 100 is about to be reset by calling a callback function. The reset detection module 1011 may transmit the reset information of the node 100 to the microcode module 1021 in the interface unit 102 through a communication channel between the control unit 101 and the interface unit 102, for example, a PCIE 3.0 communication channel. Optionally, the notification message sent by the reset detection module 1011 to the microcode module 1021 includes the identifier of the node 100 and the information that the node 100 is about to be reset. After the reset detection module 1011 transmits the reset information of the node 100 to the microcode module 1021 in the interface unit 102, it can notify the reset module of the operating system of the node 100 to start the reset.

Optionally, the reset detection module 1011 may send a notification message that the node 100 resets according to a private protocol with the microcode module 1021. For example, the reset detection module 1011 carries the pre-defined private interface command with the microcode module 1021, carries the information that the node 100 is about to be reset and the identification of the node 100, and sends the private interface command to the microcode module 1021. After receiving the message sent by the reset detection module 1011 through the private protocol, the microcode module 1021 generates a reset notification message according to the identifier of the node 100 and the information that the node 100 is about to reset, and sends the generated reset notification message to the node 200 and the node 300 through the network Text.

The microcode module 1021 can send the generated reset notification message to the node 200 and the node 300 through the network in various ways, either by sending a targeted reset notification message or by broadcasting a reset notification message; it can be either The reset notification message sent through the private network may also be a reset notification message sent through the public network. This application does not limit specific implementation methods. For example, when the distributed system composed of the node 100, the node 200, and the node 300 has its own private network, the microcode module 1021 may send a reset notification message through the private network. The reset notification message may be a directional message or a broadcast message. Text. If the distributed system composed of node 100, node 200, and node 300 does not have its own private network, microcode module 1021 may send a reset notification message to node 200 and node 300 through the public network. The reset notification message may be a directional message In this case, the reset notification message sent by the microcode module 1021 may carry the IP address of the node 100, the IP addresses of the node 200 and the node 300, and so on. It can be understood that the transmission of the notification message through the private network is more efficient and real-time than the transmission of the notification message through the public network.

Other nodes (node 200 and node 300) receive the reset notification message sent by node 100 by listening to the message or directly receiving it, and obtain the information that node 100 is about to be reset according to the received reset notification message, and pass The cluster management software performs management operations such as service switching or isolation.

In another implementation manner of the embodiment of the present application, the node 100 may also be abnormal due to a power failure. In this case, it is also necessary to quickly notify the node 200 and the node 300 of the information about the abnormal power supply of the node 100. Specifically, as shown in FIG. 3, when the power supply in the node 100 is abnormal, the power supply module 103 triggers a power-down instruction signal to notify the interface unit 102. In the way that the power module 103 is triggered, a pin (for example, a pin) can be used to indicate power-down to generate a power-down indication signal. In a specific implementation, the power-down indication signal may be a PS_OK signal, or other signals used to indicate mains power-down.

The power module 103 can trigger the pin transition of the interface unit 102 to transmit the information that the node 100 is powered off to the interface unit 102. For example, the power module 103 generates the PS_OK signal by triggering the transition of the pin, and triggers the transition of the pin of the interface unit 102 by the PS_OK signal. For example, when the pin of the trigger interface unit 102 transitions from a high level to a low level. Optionally, the power module 103 may also trigger the pins defined by the microcode module 1021 to implement the transmission of power-down information.

The microcode module 1021 in the interface unit 102 can detect the signal transition of the pin. When it is detected that the pin has transitioned, the microcode module 1021 obtains the information that the node 100 is powered off. For example, when the pin transitions from a high level to a low level, the microcode unit 1021 obtains information that the node 100 is powered down.

It should be noted that the above-mentioned PS_OK signal triggers the transition of the pin earlier than the power module 103 is powered off, at least 200 microseconds in advance, which can be specifically achieved by the capacitor energy storage of the power module 103, which will not be described in detail.

After confirming that the node 100 is powered off, the microcode module 1021 in the interface unit 102 generates a power-down notification message according to the node 100's identification, power-off status, and power-off time, and sends the generated message to the node 200 and the node 300 through the network Power-down notification message.

Specifically, when the microcode module 1021 obtains the node 100 power-down information, the node 100 is about to power down. If the microcode module 1021 cannot quickly generate a power-down notification message, it may fail to send the power-down notification message due to the node 100 being powered down. To improve the speed at which the microcode module 1021 sends a power-down notification message when the node 100 is powered down, the information required by the power-down notification message can be configured in the microcode module 1021 when the node 100 is initialized at power-up. In this way, when the microcode module 1021 obtains the information about the power down of the node 100, it can quickly generate and send a power down notification message according to the configured information. For example, when the node 100 is powered on and initialized, the identifier of the node 100 is configured in the microcode module 1021. Optionally, when a power-down notification message needs to be sent directionally, the IP address information of the node to which it needs to be sent may also be configured in the microcode module 1021. In this way, when the microcode module 1021 obtains the information of the node 100 power down, according to the pre-configured information and a time stamp, the power down notification message of the node 100 power down can be quickly generated and sent.

Other nodes (node 200 and node 300) receive the notification message by listening or directly receive the power-down notification message sent by node 100, and obtain the power-down information of node 100 according to the received power-down notification message, through the cluster The management software performs management operations such as business switching or isolation.

It can be understood that the reset notification message or the power-down notification message in the embodiment of the present application may be an ordinary message, but only carries necessary information, such as reset information or power-down information. Of course, the reset notification message or the power-down notification message in the embodiment of the present application is different from the ordinary heartbeat message. Ordinary heartbeat messages do not carry necessary information such as reset or power failure, but are used for communication timeout and smoothness detection.

The above is an implementation manner of quickly notifying other nodes when the node 100 is reset or powered off. For other nodes in the distributed system, such as node 200 and node 300, when a reset or power failure occurs, the implementation is similar to the implementation of node 100, and will not be described in detail.

In another implementation manner of the distributed system provided by the embodiments of the present application, the nodes in the distributed system further include a message monitoring module. The message monitoring module is used to monitor reset or power-off information sent by other nodes. As shown in FIG. 4, the control unit 101 of the node 100 further includes a message monitoring module 1012, the control unit 201 of the node 200 further includes a message monitoring module 2012, and the control unit 301 of the node 100 further includes a message monitoring module 3012 . Taking node 100 as an example, the message monitoring module 1012 in the node 100 is used to monitor the reset or power-down notification messages sent by the node 200 or the node 300, so that the node 100 can perform service switching or isolation through the cluster management software. Management operations.

It can be understood that the above-mentioned control unit includes a message monitoring module, specifically the operating system running by the control unit includes a message monitoring module. The embodiments and drawings in this application are concise and convenient. The operating system running by the control unit includes a message monitoring module. The control unit is described as including a message monitoring module; the control unit is implemented by executing the code corresponding to the message monitoring module. The function is described as the function realized by the message monitoring module.

FIG. 5 is a schematic structural diagram of another implementation manner of a distributed system provided by an embodiment of the present application. As shown in FIG. 5, the difference between FIG. 5 and FIG. 2 is that each node includes a BMC. For example, node 100 further includes BMC 104, node 200 further includes BMC 204, and node 300 further includes BMC 304.

Taking node 100 as an example, BMC 104 is connected to control unit 101, interface unit 102, and power module 103, respectively. When the node 100 is reset, the BMC 104 obtains the reset information of the node 100 from the control unit 101, and sends a notification message through the interface unit 102 to pass the reset information of the node 100 to the interface unit 102. After the interface unit 102 obtains the message that the node 100 is reset, the node 100 can start the reset. The manner in which the control unit 101 acquires the reset information of the node 100 is the same as the manner in which the control unit 101 acquires the reset information of the node 100 in FIGS. 2 and 3 above, and details are not described herein again.

The interface unit 102 generates a reset notification message according to the reset information of the node 100 acquired from the BMC 104, and sends it to the node 200 and the node 300. The manner in which the interface unit 102 sends the reset information of the node 100 to the node 200 and the node 300 is similar to the manner in which the interface unit 102 in FIG. 2 and FIG. 3 is sent, and will not be described repeatedly.

When the power module 103 in the node 100 is abnormal, the power module 103 triggers a power-down instruction signal to notify the BMC 104. In the way that the power module 103 triggers, a pin can be used to indicate power down to generate a power down indication signal. In a specific implementation, the power-down indication signal may be a PS_OK signal, or other signals used to indicate mains power-down. The power supply module 103 may trigger the pin transition of the BMC 104 through the power-down instruction signal, so as to transmit the information of the node 100 power-down to the BMC 104. The BMC 104 transfers to the interface unit 102 the acquired information about the power down of the node 100, for example, a message containing the power down information of the node 100 may be sent to the interface unit 102. The interface unit 102 generates a power-down notification message according to the power-down information of the node 100 and sends it. The interface unit 102 sends a power-down notification message to the node 200 and the node 300 to notify the node 100 of the power-down, in the same way as the interface unit 102 in FIG. 2 or FIG. 3 notifies the node 200 and the node 300 through the power-down notification message. No longer.

The above BMC 104 notifies the node 200 and the node 300 of the reset information of the node 100 through the interface unit 102, which is slightly slower than the control unit 101 notifies the node 200 and the node 300 of the reset information of the node 100 directly through the interface unit 102, or Notifying the node 200 and node 300 of the power down of the node 100 through the interface unit 102 is slightly slower than directly informing the interface unit 102 of the power module 103, but it can still quickly notify the node of the reset or power down of the node 100 200 and node 300 can improve the efficiency and speed of notification compared to the method of detecting and notifying by heartbeat. Moreover, since there is no need to set a preset threshold for heartbeat detection, the occurrence of misjudgment caused by improper threshold setting is avoided.

Referring to FIG. 6, FIG. 6 is a schematic structural diagram of another implementation manner of a distributed system provided by an embodiment of the present application. Among them, each node includes a control unit, a power supply module and a BMC, and each node sends a reset or power-down notification message to other nodes through the BMC. As shown in FIG. 6, the node 100 includes a control unit 101, a power module 103, and a BMC 104, the node 200 includes a control unit 201, a power module 203, and a BMC 204, and the node 300 includes a control unit 301, a power module 303, and a BMC 304.

Taking the node 100 as an example, the implementation of the control unit 101 is similar to the implementation of the control unit in FIG. 2 or FIG. 3 and will not be described in detail. The BMC 104 acquires the reset information of the node 100 from the control unit 101, and generates a reset notification message according to the reset information of the node 100 and sends it to the node 200 and the node 300. The specific implementation manner of the BMC 104 acquiring the reset information of the node 100 from the control unit 101 is similar to the manner in which the interface unit 102 in FIG. 2 or FIG. 3 acquires the reset information of the node 100 from the control unit. Similarly, the manner in which the BMC 104 obtains the power-down information of the node 100 from the power module 103 is similar to the manner in which the interface unit 102 in FIG. 2 or FIG. 3 obtains the power-down information of the node 100. After the BMC 104 obtains the information about the reset or power down of the node 100, the method of notifying the node 200 and the node 300 may be sent to the node 200 and the node 300 by sending a directed message or a broadcast notification message. For example, the BMC 104 in the node 100 sends a reset notification message containing information about the reset of the node 100 to the BMC 204 in the node 200 and the BMC 304 in the node 300.

In specific implementation, the control unit in each node of the distributed system shown in FIG. 6 further includes a reset detection module, and the BMC further includes a notification module. As shown in FIG. 7, the control unit 101 includes a reset detection module 1011, the BMC 104 includes a notification module 1041; the control unit 201 includes a reset detection module 2011, and the BMC 204 includes a notification module 2041; the control unit 301 includes a reset detection module 3011, and the BMC 304 includes a notification module 3041 . The reset detection module in the control unit shown in FIG. 7 is similar to the reset detection module in the control unit shown in FIG. 3, and may be a reset detection module in the operating system that the control unit runs, and will not be described in detail.

Taking the Linux operating system running on the node 100 and the software in the node 100 requiring reset as an example, the manner in which the reset detection module 1011 acquires reset information and transmits the reset information to the notification module 1041 will be described. The reset detection module 1011 can be registered on the notification chain provided by the Linux operating system. When a piece of software running in the node 100 is to be reset, a notification will be sent to the notification chain. The reset detection module 1011 registered on the notification chain knows that the node 100 is about to be reset by calling a callback function, and through the communication channel between the control unit and the BMC, for example, it can be a PCIE 3.0 communication channel to transmit the reset information of the node 100 to the BMC 104的 Notice module 1041.

Optionally, the reset detection module 1011 may send a notification message of the node 100 reset according to a private agreement with the notification module 1041. For example, the reset detection module 1011 carries the reset information of the node 100 and the identification of the node 100 through a pre-defined private interface command with the notification module 1041, and sends the private interface command to the notification module 1041.

After receiving the message sent by the reset detection module 1011, the notification module 1041 generates a reset notification message according to the identifier of the node 100 and the reset information of the node 100, and sends the generated reset notification to the node 200 and the node 300 through the network in a directed or broadcast manner Message.

The notification module 1041 may send the generated notification message to the node 200 and the node 300 through the network in multiple implementation manners, and the specific implementation manner is not limited in this application. For example, when the distributed system composed of the node 100, the node 200, and the node 300 has its own private network, the notification module 1041 may send a reset notification message through the private network. If the distributed system composed of node 100, node 200, and node 300 does not have its own private network, the notification module 1041 may send a reset notification message to the node 200 and the node 300 via the public network. The reset notification message may carry the IP address of the node 100, and the IP addresses of the node 200 and the node 300. It can be understood that the transmission of the reset notification message through the private network is more efficient and real-time than the transmission of the reset notification message through the public network.

Other nodes (node 200 and node 300) receive the reset notification message by listening or directly receive the reset notification message. Specifically, it may be that the BMC 204 in the node 200 receives the reset notification message sent by the node 100 through the BMC 104, or the BMC 304 in the node 300 receives the reset notification message sent by the node 100 through the BMC 104.

Optionally, the BMC 204 of the node 200 may include a receiving module (not shown in the figure) for receiving a reset notification message sent by the notification module 104 in the BMC 104. Optionally, the receiving module and the notification module 2041 in the BMC 204 may be the same module. The implementation manner of the BMC 304 in the node 300 is similar to the implementation manner of the BMC 204 in the node 200, and will not be described in detail.

After other nodes (node 200 and node 300) obtain the reset information of node 100, they can perform management operations such as service switching or isolation through cluster management software.

When the power supply in the node 100 is abnormal or is powered off, the BMC 104 may also send a power-off notification message for power-off. The power module 103 may trigger a power-down instruction signal to pass the information about the power-down of the node 100 to the BMC 104. For example, the power module 103 may indicate power down through a pin (pin pin) to generate a power down indication signal. In a specific implementation, the power-down indication signal may be a PS_OK signal, or other signals used to indicate mains power-down. Through the power-down instruction signal, the pins of the BMC 104 are triggered to generate transitions, and the BMC 104 obtains information about the power down of the node 100 according to the transitions of the pins. Optionally, the pin defined by the notification module 1041 may also be triggered to realize the transmission of the information that the node 100 is powered off.

Specifically, when the notification module 1041 obtains the node 100 power-down information, the node 100 is about to power down. If the notification module 1041 cannot quickly generate a power-down notification message, it may fail to send the power-down notification message due to the node 100 being powered down. In order to increase the speed of the notification module 1041 to send a power-down notification message when the node 100 is powered off, the information required by the power-down notification message may be configured in the notification module 1041 when the node 100 is initialized after power-on. In this way, the notification module 1041 can quickly generate and send a power-down notification message according to the configured information when acquiring the power-down information of the node 100. For example, when the node 100 is powered on and initialized, the identifier of the node 100 is configured in the notification module 1041. When the power-down notification message needs to be sent in a targeted manner, the IP address information of the node to which it needs to be sent may also be configured in the notification module 1041. In this way, when the notification module 1041 obtains the information of the node 100 power down, according to the pre-configured information and a time stamp, the power down notification message of the node 100 power down can be quickly generated and sent.

Other nodes (node 200 and node 300) receive the power-down notification message by listening to the power-down notification message or directly receive the power-down information of the node 100 according to the received power-down notification message, and perform it through the cluster management software Management operations such as business switching or isolation.

The above is an implementation manner of quickly notifying other nodes when the node 100 is reset or powered off. For other nodes in the distributed system, such as node 200 and node 300, when reset or power failure occurs, the implementation manner is similar to that of node 100, and will not be described in detail.

In another implementation manner of the distributed system provided by the embodiments of the present application, the nodes in the distributed system further include a message monitoring module. The message monitoring module is used to monitor reset or power-off information sent by other nodes. As shown in FIG. 8, the control unit 101 of the node 100 further includes a message monitoring module 1012, the control unit 201 of the node 200 further includes a message monitoring module 2012, and the control unit 301 of the node 100 further includes a message monitoring module 3012 . Taking the node 100 as an example, the message monitoring module 1012 in the node 100 is used to monitor the reset or power-down notification message sent by the node 200 or the node 300, and perform management operations such as service switching or isolation through cluster management software. The implementation of the message monitoring module in FIG. 8 is similar to that of the message monitoring module in FIG. 4 and will not be repeated here.

In the above embodiment, when the node 100 is reset, the reset information of the node 100 is sent to other nodes in the distributed system through the control unit 101 and the interface unit 102, or the reset information of the node 100 is sent to the distributed system through the BMC. Other nodes. In specific implementation, other software or hardware may also be used to send the reset information of the node. For example, the operating system of the node 100 may directly send the reset information of the node 100 to other nodes in the distributed system through a certain interface. , It may be that other chips or logic units send the reset information of the node 100 to other nodes in the distributed system, or that the microcode module 1021 in the interface unit 102 obtains the reset information of the node 100 from other chips or logic units. And send the reset information of node 100 to other nodes in the distributed system. As long as the node 100 resets itself and actively sends the reset information to other nodes in the distributed system, it is within the scope of the embodiments of the present application relative to the manner in which the efficiency and real-time notification can be improved through heartbeat.

Similarly, when the node 100 is powered down, the power module 103 and the interface unit 102 send the information about the node 100's power down to other nodes in the distributed system, or the BMC sends the information about the node 100's power down to the distributed system In other nodes. In specific implementation, other hardware or software in the node 100 may also be used to send the reset information of the node. For example, other chips or logic units of the node 100 may send the information of the node 100 to the distributed system. Other nodes. As long as the node 100 actively sends its own power-down information to other nodes in the distributed system when it is powered off, it is covered by the embodiments of this application relative to the way that the efficiency and real-time of notification can be improved by heartbeat detection. Within range.

9A is a schematic structural diagram of a computer device 900 according to an embodiment of the present application. As shown in FIG. 9A, the computer device 900 includes a processor 901 and a message sending unit 902. The computer device 900 is a computer device in a distributed system, and the distributed system may include more than two computer devices.

The processor 901 is configured to acquire the reset information of the computer device 900 when the computer device 900 is reset, and transmit the reset information of the computer device 900 to the message sending unit 902;

The message sending unit 902 is configured to generate a reset notification message including information reset by the computer device 900 according to the information reset by the computer device 900, and send the reset notification message to the distributed system Other computer equipment.

The above-mentioned computer device 900 obtains the reset information of the device through the message sending unit 902, and can generate a reset notification message containing the reset information according to the obtained reset information of the device, and send it to other devices in the distributed system. Quickly notify other devices in the distributed system of the reset information of this device. Compared with the prior art method of detecting whether other devices are reset by heartbeat, not only can the efficiency of resetting information transmission be improved, but also the occurrence of misjudgment can be avoided.

Specifically, the specific implementation manner of the computer device 900 may be implemented with reference to the implementation manner of the node 100 in FIGS. 2 to 4 described above. For example, the message sending unit 902 is similar to the implementation of the interface unit 102 in FIGS. 2 to 4, and the processor 901 is similar to the implementation of the control unit 101 in FIGS. 2 to 4 and will not be described in detail.

Optionally, as shown in FIG. 9B, the computer device 900 further includes a power module 903. The power supply module 903 may be connected to the message sending unit 902 through a general input and output GPIO pin;

The power supply module 903 is used to transfer the power-off information of the computer device 900 to the message sending unit 902 through the transition of the trigger pin when the computer device 900 is powered off;

The message sending unit 902 is further configured to generate a power-down notification message containing the power-down information of the computer device 900 according to the power-down information of the computer device 900, and send the power-down notification message to The other computer equipment in the distributed system.

The above power supply module 903 is similar to the implementation manner of the power supply module 103 in FIGS. 2 to 4 and will not be described in detail.

The above-mentioned computer device 900 obtains the power-down information of the device through the message sending unit 902, and can generate a power-down notification message containing the power-down information according to the obtained power-down information of the device and send it to other users in the distributed system The device can quickly notify other devices in the distributed system of the power down of the device. Compared with the prior art method of detecting whether other devices are powered off by heartbeat, not only can the efficiency of power-off information transmission be improved, but also the occurrence of misjudgment can be avoided.

10 is a schematic structural diagram of another implementation manner of a computer device 900 provided by an embodiment of this application. As shown in FIG. 10, the computer device 900 further includes a BMC 904. The BMC904 is used to obtain the reset information of the computer device 900 from the operating system in the computer device 900, and send the reset information of the computer device 900 to the message sending unit 902; or the BMC904 The power supply module obtains information on power-off of the computer device 900, and sends the information on power-off of the computer device 900 to the message sending unit 902.

Obtain the reset or power-off information of the computer device 900 through the BMC 904, and send the reset or power-off information of the computer device 900 to other computer devices in the distributed system through the message sending unit 902. The power-off method can improve the efficiency of computer equipment reset or power-off information acquisition in a distributed system, and can avoid misjudgment.

Specifically, the implementation manner of the BMC904 in FIG. 10 can be implemented by referring to the implementation manner of the BMC104 in FIG. 5 described above, and details are not described herein again.

In another implementation manner of the embodiment of the present application, the message sending unit 902 in the computer device 900 is implemented by the BMC 904. As shown in FIG. 11, the computer device 900 includes a central processor 901, a BMC 904 and a power module 903, and the BMC 904 includes a message sending unit 902. Specifically, the implementation of the computer device 900 shown in FIG. 11 can be implemented with reference to the implementation of the node 100 in FIG. 7 or FIG. 8 described above. For example, the implementation of the BMC904 can be implemented by referring to the implementation of the BMC104 in FIG. 7 or FIG. 8, and the message sending unit 902 in the BMC904 can be implemented by referring to the implementation of the notification module 1041 in FIG. 7 or 8. Repeat again.

FIG. 12 is a schematic flowchart of a method for sending device information according to an embodiment of the present application. As shown in FIG. 12, the method includes:

Step S100: The processor in the computer device obtains the reset information of the computer device before resetting the computer device, and transmits the reset information of the computer device to the message sending unit in the computer device;

Step S200: the message sending unit receives the reset information of the computer device;

Step S300: The message sending unit generates a reset notification message containing the reset information of the computer device according to the reset information of the computer device and sends the reset notification message to the distribution where the computer device is Computer equipment in a distributed system.

The above method obtains the reset information of the computer device, and generates a reset notification message containing the reset information according to the obtained reset information of the computer device, and sends it to other devices in the distributed system, which can quickly reset the device. Notify other devices in the distributed system. Compared with the prior art method of detecting whether other devices are reset by heartbeat, not only can the efficiency of resetting information transmission be improved, but also the occurrence of misjudgment can be avoided.

The above method may be implemented by a computer device in a distributed system. For specific implementation, reference may be made to the implementation manner of the node 100 in FIG. 2 to FIG. 8 described above, and details are not described herein again.

Optionally, in the above method, the method further includes: the processor acquiring reset information of the computer device through a preset function and a notification chain about reset in the operating system of the computer device.

Optionally, the preset function is a callback function, and the callback function is registered on the notification chain;

Optionally, the processor transmits the reset information of the computer device to the message sending unit through the BMC in the computer device.

Optionally, the message sending unit is a BMC in the computer device.

Optionally, the method further includes:

When the computer device is powered off, the message sending unit acquires the information of the computer device's power off through the transition of the pin, generates a power-down notification message containing the computer device's power-off information, and sends The power-down notification message is sent to the other computer equipment in the distributed system.

The above method can obtain the power-down information of the device and generate a power-down notification message containing the power-down information according to the obtained power-down information of the device, and send it to other devices in the distributed system. The device power-off information notifies other devices in the distributed system. Compared with the prior art method of detecting whether other devices are powered off by heartbeat, not only can the efficiency of power-off information transmission be improved, but also the occurrence of misjudgment can be avoided.

Those of ordinary skill in the art may realize that the units, modules, and steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software In the above description, the composition and steps of each example have been generally described in terms of function. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present invention.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the node 100 described above is only schematic; for example, the division of the above units or modules is only a division of logical functions, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual connection or direct connection or communication connection may be a connection or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form connection.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention essentially or part of the contribution to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium In it, several instructions are included to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed by the present invention. Modifications or replacements, these modifications or replacements should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A computer device, the computer device is a computer device in a distributed system, the computer device includes a processor, characterized in that the computer device further includes a message sending unit, the message sending unit and the The processor is connected through the bus;

The processor is configured to acquire the reset information of the computer device when the computer device is reset, and transmit the reset information of the computer device to the message sending unit;

The message sending unit is configured to generate a reset notification message including the reset information of the computer device based on the reset information of the computer device, and send the reset notification message to other users in the distributed system Computer equipment.
The computer device according to claim 1, characterized in that:

The operating system in the computer device includes a notification chain about resetting;

Before the computer device is reset, the processor obtains the reset information of the computer device from the notification chain through a preset function.
The computer device according to claim 1, wherein the operating system in the computer device includes a reset detection module,

The reset detection module is registered on a reset notification chain in the operating system of the computer device, and obtains information about the reset of the computer device on the reset notification chain through a callback function;

The processor obtains reset information of the computer device through the reset detection module.
The computer device according to any one of claims 1-3, characterized in that:

The message sending unit includes a microcode module;

The microcode module is used to generate the reset notification message and send the reset notification message to other computer devices in the distributed system.
The computer device according to any one of claims 1-4, wherein the computer device further comprises a baseboard management controller BMC;

The processor transmits the reset information of the computer device to the message sending unit through the BMC.
The computer device according to any one of claims 1 to 5, wherein the computer device further comprises a power supply module, and the power supply module is connected to the message sending unit through a general input and output GPIO pin;

The power supply module is used to transfer the power-off information of the computer device to the message sending unit through the transition of the trigger pin when the computer device is powered off;

The message sending unit is further configured to generate a power-down notification message containing the computer device's power-down information based on the computer device's power-down information, and send the power-down notification message to the distribution Other computer equipment in a distributed system.
The computer device according to any one of claims 1 to 3, wherein the message sending unit is a baseboard management controller BMC, and the BMC further includes a notification module;

The notification module generates the reset notification message according to the reset information of the computer device acquired by the BMC, and sends the reset notification message to the other computer devices in the distributed system.
The computer device according to claim 7, wherein the computer device further comprises a power supply module;

The power supply module is also used to transmit the power-off information of the computer device to the BMC by triggering the transition of the pin when the computer device is power-off;

The notification module is further configured to generate a power-down notification message including the computer device's power-down information according to the acquired power-down information of the computer device, and send the power-down notification message to the distribution Other computer equipment in a distributed system.
A distributed computer equipment system, characterized by comprising at least two computer equipment as claimed in claims 1-8.
A method for sending device information, characterized in that the method includes:

The processor in the computer device acquires the reset information of the computer device before resetting the computer device, and transmits the reset information of the computer device to the message sending unit in the computer device;

The message sending unit receives the reset information of the computer device;

The message sending unit generates a reset notification message containing the reset information of the computer device based on the reset information of the computer device, and sends the reset notification message to the distributed system where the computer device is located Other computer equipment.
The method of claim 10, further comprising:

The processor obtains reset information of the computer device through a preset function and a notification chain about reset in the operating system of the computer device.
The method according to claim 11, wherein the preset function is a callback function, and the callback function is registered on the notification chain;

The processor acquiring the reset information of the computer device through a preset function and a notification chain about reset in the operating system of the computer device includes:

The processor obtains the reset information of the computer device from the notification chain through the callback function.
The method according to any one of claims 10-12, wherein the method further comprises:

The processor transmits the reset information of the computer device to the message sending unit through the baseboard management controller BMC in the computer device.
The method according to any one of claims 10-12, characterized in that

The message sending unit is a baseboard management controller BMC in the computer device.
The method according to any one of claims 11-14, wherein the method further comprises:

When the computer device is powered off, the message sending unit acquires the information of the computer device's power off through the transition of the pin, generates a power-down notification message containing the computer device's power-off information, and sends The power-down notification message is sent to the other computer equipment in the distributed system.