FI127498B - Rack having automatic recovery function and automatic recovery method for the same - Google Patents

Rack having automatic recovery function and automatic recovery method for the same Download PDF

Info

Publication number
FI127498B
FI127498B FI20155123A FI20155123A FI127498B FI 127498 B FI127498 B FI 127498B FI 20155123 A FI20155123 A FI 20155123A FI 20155123 A FI20155123 A FI 20155123A FI 127498 B FI127498 B FI 127498B
Authority
FI
Finland
Prior art keywords
bmc
rmc
rack
default communication
communication channel
Prior art date
Application number
FI20155123A
Other languages
Finnish (fi)
Swedish (sv)
Other versions
FI20155123A (en
Inventor
Yen-Yu Chen
Wan-Chun Yeh
yu-heng Su
Shih-Chieh Hsu
Original Assignee
Aic Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aic Inc filed Critical Aic Inc
Publication of FI20155123A publication Critical patent/FI20155123A/en
Application granted granted Critical
Publication of FI127498B publication Critical patent/FI127498B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/24Resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication

Abstract

A rack comprising a control module and a plurality of nodes is present. The control module comprises a rack management controller (RMC), and each of the plurality of nodes comprises a baseboard management controller (BMC). The RMC communicates with the BMCs respectively through a plurality of default communication channels, and the RMC controls the nodes and transmits necessary data thereto through the BMCs. When losing response signal from one of the BMCs, the RMC resends same signal to the non-responded BMC. If a resend threshold is achieved, the RMC sends a control signal to a reset pin of the non-responded BMC directly through a GPIO channel to force the non-responded BMC to reset.

Description

RACK HAVING AUTOMATIC RECOVERY FUNCTION AND AUTOMATIC RECOVERY METHOD FOR THE SAMERACK HAVING AUTOMATIC RECOVERY FUNCTION AND AUTOMATIC RECOVERY METHOD FOR THE SAME

BACKGROUND OF THE INVENTION 1. Field of the InventionBACKGROUND OF THE INVENTION 1. Field of the Invention

The invention relates to a rack, and in particularly to a rack having automatic recovery function, and an automatic recovery method used by the rack. 2. Description of Prior ArtThe invention relates to a rack, and particularly to a rack having an automatic recovery function, and an automatic recovery method used by the rack. 2. Description of Prior Art

Generally, each server arranged in a rack respectively comprises a baseboard management controller (BMC), and the servers respectively use the BMCs to control and maintain themselves.Generally, each server is arranged in a rack corresponding to a baseboard management controller (BMC), and the Servers respectively use the BMCs to control and maintain themselves.

The rack usually comprises a rack management controller (RMC), used to communicate with the BMCs in the servers. The rack uses the RMC to control the servers, collect information from the servers, and transmit files needed by the servers (such as updated files for updating a firmware) through the BMCs.The rack usually consists of a rack management controller (RMC) used to communicate with the BMCs in the Servers. The rack uses the RMC to control the Servers, collect information from the Servers, and transmit files needed by the Servers (such as updated files for updating the firmware) through the BMCs.

In the related art, the RMC basically communicates with the BMCs through communication channels such as intelligent platform management bus (IPMB), inter-integrated circuit (I C) or local area network (LAN), and uses the communication channels to transmit control command, information and files.In related art, RMC basically communicates with BMCs through communication channels such as Intelligent Platform Management Bus (IPMB), inter-integrated circuit (IC) or local area network (LAN), and uses communication channels to transmit control command, information and files.

However, each communication channel mentioned above is bi-directional. More specific, if the RMC wants to communicate with a target BMC, it needs to send an initial ASK signal to the target BMC in advance. After receiving a RESPONSE signal from the target BMC, the RMC can make sure that the communication channel is flowing, and the then transmit real data to the target BMC. In other words, if the target BMC itself or a communication interface of the BMC has a problem (for example, a firmware failure or hardware signal mistake), such that the target BMC cannot response the ASK signal from the RMC, the RMC cannot communication with the target BMC successfully.However, each communication channel mentioned above is bi-directional. More specific if the RMC wants to communicate with the target BMC, it needs to send an initial ASK signal to the target BMC in advance. After receiving a RESPONSE signal from the target BMC, the RMC can make sure that the communication channel is flowing, and then transmit real data to the target BMC. In other words, if the target BMC itself or the communication interface of the BMC has a problem (for example, the firmware failure or hardware signal mistake), such that the target BMC cannot respond to the ASK signal from the RMC, the RMC cannot communicate with target BMC successfully.

In the current rack, each server in the rack is configured with a watchdog function, which can detect problems of the BMC and reset the BMC automatically when the BMC do have a problem. However, the watchdog function mentioned above can only detect some specific failure (for example, the whole BMC shuts down). In some situations, the watchdog function cannot accurately detect what happens to the BMC and will not reset the BMC automatically. As a result, the RMC can only notify a manager of the rack by itself (for example, makes an alert via a buzzer or a LED thereof, sends e-mail or MMS to the manager, etc.).In the current rack, each server in the rack is configured with a watchdog function that can detect problems with the BMC and reset the BMC automatically when the BMC do have a problem. However, the watchdog function mentioned above can only detect some specific failure (for example, the whole BMC shuts down). In some situations, the watchdog function cannot accurately detect what happens to the BMC and will not reset the BMC automatically. As a result, RMC can only notify the manager of the rack by itself (for example, makes an alert via a buzzer or a LED section, sends emails or MMS to the manager, etc.).

If the manager receives above alert, he or she will reset the BMC manually (for example, pulls the server from the rack (for interrupting a power of the BMC), and then inserts the server into the rack again (for resetting the BMC)).If the manager receives the alert above, he or she will reset the BMC manually (for example, pulls the server from the rack (for interrupting the power of the BMC), and then inserts the server into the rack again (for resetting the BMC) ).

As described above, the communication problem between the RMC and the BMC can only be solved manually in the related art, it is very inconvenient. Also, if the rack is sold to a client and the client lacks the ability for solving the above problem, the client needs to send the rack or the server back to the original factory for fixing, or to ask the manager to fix the rack or the server at the client directly.As described above, the communication problem between RMC and BMC can only be solved manually in related art, it is very inconvenient. Also, if the rack is sold to the client and the client lacks the ability to solve the above problem, the client needs to send the rack or server back to the original factory for fixing, or to ask the manager to fix the rack or the server at the client directly.

Document US5636341A discloses a system with a communication processors, where when a fault occurs in the interface controller is notified of the occurrence and failed location by means of a broad cast signal.Document US5636341A discloses a system with a communication processor, where a fault occurs in the interface controller is notified of an occurrence and a failure location by means of a broad cast signal.

Document US2005223284A1 discloses a data storage system, where the operation of a communication subsystem is enabled to provide communication between the storage processors.Document US2005223284A1 discloses a data storage system, wherein the operation of a communication subsystem is enabled to provide communication between the storage processors.

Document D3 discloses an electronic control apparatus for checking a reset function; having a main controller resting the failure signal using a test failure signal.Document D3 discloses an electronic control apparatus for checking a reset function; having a main controller resting the failure signal using a test failure signal.

Document D4 disclosed a networker multi computer environment with redundant links, where network interface cards (NICs) are commonly duplicated and teamed to provide a recovery mechanism when network components fail.Document D4 discloses a networker multi-computer environment with redundant links where network interface cards (NICs) are commonly duplicated and teamed to provide a recovery mechanism when network components fail.

SUMMARY OF THE INVENTIONSUMMARY OF THE INVENTION

The object of the present invention is to provide a rack having automatic recovery function and an automatic recovery method used by the rack, which can reset a baseboard management controller (BMC) to recover to an initial status when a rack management controller (RMC) in the rack cannot communicate with the BMC in a node of the rack regularly.The object of the present invention is to provide a rack having an automatic recovery function and an automatic recovery method used by the rack that can reset a baseboard management controller (BMC) to recover the initial status when a rack management controller (RMC) is in use. the rack cannot communicate with the BMC in a node of the rack regularly.

According to the above object, the present invention discloses a rack comprising a control module and a plurality of nodes. The control module comprises the RMC, and each of the plurality of nodes comprises the BMC. The RMC communicates with the BMCs respectively through a plurality of default communication channels, and the RMC controls the nodes and transmits necessary data thereto through the BMCs. When losing response signal from one of the BMCs, the RMC resends same signal to the non-responded BMC. If a resend threshold is achieved, the RMC sends a control signal to a reset pin of the non-responded BMC directly through a GPIO channel to force the non-responded BMC to reset.According to the above object, the present invention discloses a rack comprising a control module and a plurality of nodes. The control module comprises the RMC, and each of the plurality of nodes comprises the BMC. The RMC communicates with the BMCs respectively through a plurality of default communication channels, and the RMC controls the nodes and transmits the necessary data thereto through the BMCs. When losing a response signal from one of the BMCs, the RMC resends the same signal to a non-responding BMC. If the resend threshold is reached, the RMC sends a control signal to the reset pin of the non-responding BMC directly through a GPIO channel to force the non-responding BMC to reset.

Comparing with related art, the present invention can force a BMC to reset and recover to an initial status through a simple and stable hardware function whenever the BMC has a problem and cannot communicate with the RMC in the rack. The RMC can establish a communication channel with the BMC again after the BMC recovers to the initial status.Comparing with related art, the present invention can force a BMC to reset and Recover to an initial status through a simple and stable hardware function whenever the BMC has a problem and cannot communicate with the RMC in the rack. The RMC can establish a communication channel with the BMC again after the BMC recovers to the initial status.

Therefore, the present invention can make sure the RMC can always control all BMCs in the rack in any situation.Therefore, the present invention can make sure RMC can always control all BMCs in the rack in any situation.

BRIEF DESCRIPTION OF THE DRAWINGSBRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a schematic view of a rack of a first embodiment according to the present invention.Figure 1 is a schematic view of a rack of a first embodiment according to the present invention.

Fig.2 is a connection diagram of a first embodiment according to the present invention.Fig.2 is a connection diagram of a first embodiment according to the present invention.

Fig.3 is a connection diagram of a second embodiment according to the present invention.Fig.3 is a connection diagram of a second embodiment according to the present invention.

Fig.4 is a reset flowchart of a first embodiment according to the present invention.Fig.4 is a reset flowchart of a first embodiment according to the present invention.

DETAILED DESCRIPTION OF THE INVENTIONDETAILED DESCRIPTION OF THE INVENTION

In cooperation with the attached drawings, the technical contents and detailed description of the present invention are described thereinafter according to a preferable embodiment, being not used to limit its executing scope. Any equivalent variation and modification made according to appended claims is all covered by the claims claimed by the present invention.In cooperation with the attached drawings, the technical contents and the detailed description of the present invention are described therein according to the preferred embodiment, being not used to limit its executing scope. Any equivalent variation and modification made according to the appended claims is all covered by the claims claimed by the present invention.

Fig.l is a schematic view of a rack of a first embodiment according to the present invention. The present invention discloses a rack 1 which has an automatic recovery function detailed described below. In particularly, the rack 1 comprises a control module 2 and a plurality of nodes 3, wherein the control module 2 at least comprises a circuit board 21 and a rack management controller (RMC) 22 electrically connected with the circuit board 21, and each of the plurality of nodes 3 respectively comprises a baseboard 31 and a baseboard management controller (BMC) 32 electrically connected with the baseboard 31. The automatic recovery function in the present invention is, for example, a reset action executed for recovering the BMCs 32 in the nodes 3 to an initial status free from communication problems.Fig. 1 is a schematic view of a rack of a first embodiment according to the present invention. The present invention discloses a rack 1 which has an automatic recovery function detailed below. In particular, the rack 1 comprises a control module 2 and a plurality of nodes 3, the control module 2 at least comprises a circuit board 21 and a rack management controller (RMC) 22 electrically connected to the circuit board 21, and each of the plurality of nodes 3 respectively consists of a baseboard 31 and a baseboard management controller (BMC) 32 electrically connected to the baseboard 31. The automatic recovery function in the present invention is, for example, a reset action executed for recovering the BMCs 32 in the nodes 3 to an initial status free from communication problems.

The control module 2 and the nodes 3 are respectively arranged in the rack 1, and the control module 2 is electrically connected with each node 3. Asa result, the RMC 22 in the control module 2 can communicate with each BMC 32 in each node 3, and can control all of the nodes 3, collect information from the nodes 3 and transmit necessary files (for example, updated file for updating a firmware) to the nodes 3 via the BMCs 32.The control module 2 and the nodes 3 are respectively arranged in the rack 1, and the control module 2 is electrically connected with each node 3. As a result, the RMC 22 in the control module 2 can communicate with each BMC 32 in each node 3 , and can control all of nodes 3, collect information from nodes 3 and transmit necessary files (for example, updated file for firmware update) to nodes 3 via BMCs 32.

Fig.2 is a connection diagram of a first embodiment according to the present invention.Fig.2 is a connection diagram of a first embodiment according to the present invention.

As shown in Fig.2, the RMC 22 in the control module 2 is connected with the BMCs 32 in the nodes 3 respectively through a plurality of default communication channels 4. In this embodiment, the default communication channels 4 are accomplished by intelligent platform management bus (IPMB), inter-integrated circuit (I C), universal asynchronous receiver/transmitter (UART) or local area network (LAN), but not limited thereto. The RMC 22 communicates with the BMCs 32 through the plurality of default communication channels 4 respectively, and transmits files needed by the nodes 3 to the BMCs 32 through the plurality of default communication channels 4, so the BMCs 32 can use the files continently.As shown in Fig.2, the RMC 22 in the control module 2 is connected to the BMCs 32 in the nodes 3 respectively through a plurality of default communication channels 4. In this, the default communication channels 4 are accomplished by Intelligent Platform Management bus (IPMB), inter-integrated circuit (IC), universal Asynchronous receiver / transmitter (UART) or local area network (LAN), but not limited thereto. The RMC 22 communicates with the BMCs 32 through the plurality of default communication channels 4 respectively, and transmits files needed by the nodes 3 to the BMCs 32 through the plurality of default communication channels 4 respectively, so the BMCs 32 can use the files continually.

For example, each of the plurality of nodes 3 respectively comprises a memory 33 electrically connected to the BMC 32 therein. Each memory 33 stores a basic input/output system (BIOS) needed by the node 3 the memory 33 arranged. When the BIOSs of the nodes 3 need to be updated, the RMC 22 receives the updated file externally (for example, an “*.ISO” file), and transmits the updated file to the BMCs 32 through the default communication channels 4 respectively. Therefore, the BMCs 32 use the received updated file to update the BIOSs in the memories 33 respectively.For example, each of the plurality of nodes 3 respectively comprises a memory 33 electrically connected to the BMC 32 therein. Each memory 33 stores a basic input / output system (BIOS) needed by the node 3 the memory 33 arranged. When the BIOSs of the nodes 3 need to be updated, the RMC 22 receives the updated file externally (for example, an “* .ISO” file), and transmits the updated file to the BMCs 32 through the default communication channels 4 respectively. Therefore, the BMCs 32 use the updated file to update the BIOSs in the Memories 33 respectively.

For completing an updating action mentioned above, the RMC 22 needs to send a ASK signal to the BMCs 32 through the default communication channels 4 respectively in advance before transmitting files to the BMCs 32. After receiving a RESPONSE signal corresponding to the ASK signal from the BMCs 32 respectively, the RMC 22 determines that the BMCs 32 are regular and the default communication channels 4 are flowing.For completing an updating action mentioned above, the RMC 22 needs to send an ASK signal to the BMCs 32 through the default communication channels 4 respectively in advance before transmitting the files to the BMCs 32. After receiving the RESPONSE signal corresponding to the ASK signal from the BMCs 32 respectively, the RMC 22 determines that the BMCs 32 are regular and the default communication channels 4 are flowing.

Therefore, the RMC 22 can transmit the files needed by the nodes 3 to the BMCs 32 through the default communication channel 4 respectively.Therefore, the RMC 22 can transmit the files needed by nodes 3 to the BMCs 32 through the default communication channel 4 respectively.

On the contrary, if one of the plurality of BMCs 32 does not respond to the RMC 22 (i.e., the plurality of BMCs 32 comprises at least one non-responded BMC 32), the RMC 22 cannot communication with the non-responded BMC 32 and cannot transmit the files to the non-responded BMC 32. For solving this problem, the RMC 22 in the present invention can control the non-responded BMC 32 through other simple and stable hardware function, so as to recover the BMC 32 from a non-responded status to the initial status which is regular.On the contrary, if one of the plurality of BMCs 32 does not respond to the RMC 22 (ie, the plurality of BMCs 32 comprises at least one non-responded BMC 32), the RMC 22 cannot communicate with the non-responded BMCs 32 and cannot transmit the files to the non-responding BMC 32. For solving this problem, the RMC 22 in the present invention can control the non-responding BMC 32 through another simple and stable hardware function, so as to recover the BMC 32 from a non-responded status to initial status which is regular.

Fig.3 is a connection diagram of a second embodiment according to the present invention. In Fig.3, an amount of the BMCs 32 in the rack 1 is depicted by 1 for example, but not intended to limit the scope of the present invention.Fig.3 is a connection diagram of a second embodiment according to the present invention. In Fig.3, an amount of BMCs 32 in the rack 1 is depicted by 1 for example, but not intended to limit the scope of the present invention.

The main technical characteristic of the rack 1 in the present invention is that the RMC 22 is electrically connected to the circuit board 21, the BMC 32 is electrically connected to the baseboard 31, and at least one control pin (not shown) of the RMC 22 is electrically connected to a reset pin 321 of the BMC 32 directly through the circuit board 21 and the baseboard 32. More specific, the RMC 22 in this embodiment is electrically connected to the reset pin 321 of the BMC 32 directly through a general purpose FO (GPIO), and establishes a GPIO channel 5 with the BMC 32.The main technical characteristic of the rack 1 in the present invention is that the RMC 22 is electrically connected to the circuit board 21, the BMC 32 is electrically connected to the baseboard 31, and at least one control pin (not shown) of the RMC 22 is electrically connected to the reset pin 321 of the BMC 32 directly through the circuit board 21 and more specific to the RMC 22 in this embodiment is electrically connected to the reset pin 321 of the BMC 32 directly through a general purpose FO (GPIO), and establishes GPIO channel 5 with BMC 32.

By using the technical solution disclosed in the present invention, if the RMC 22 sends the ASK signal to the BMC 32 and does not receive the RESPONSE signal corresponding to the ASK signal from the BMC 32 after a waiting time, the BMC 32 is considered to as a non-responded BMC 32. The RMC 22 resends the same ASK signal to the non-responded BMC 32 again. If a resend time of resending the ASK signal is longer than a resend threshold, the RMC 22 determines that the non-responded BMC 32 has some problem (i.e., the non-responded BMC 32 is considered to as a problematic BMC 32).By using the technical solution disclosed in the present invention, if the RMC 22 sends the ASK signal to the BMC 32 and does not receive the RESPONSE signal corresponding to the ASK signal from the BMC 32 after a waiting time, the BMC 32 is considered to as a non-responded BMC 32. The RMC 22 resends the same ASK signal to the non-responded BMC 32 again. If the reset time of resending the ASK signal is longer than the resend threshold, the RMC 22 determines that the non-responding BMC 32 has some problem (i.e., the non-responding BMC 32 is considered to be a problematic BMC 32).

In this embodiment, when determining the non-responded BMC 32 is the problematic BMC 32, the RMC 22 controls the problematic BMC 32 through the GPIO channel 5. In particularly, the RMC 22 sends a control signal (through the control pin) to the reset pin 321 of the problematic BMC 32 directly through the GPIO channel 5, so as to force the problematic BMC 32 to reset.In this, when determining the non-responding BMC 32 is the problematic BMC 32, the RMC 22 controls the problematic BMC 32 through the GPIO channel 5. In particular, the RMC 22 sends a control signal (through the control pin) to the reset pin 321 of problematic BMC 32 directly through GPIO channel 5, so as to force problematic BMC 32 to reset.

For example, the RMC 22 is set to output a low potential signal (such as “0”) or not output any signal via the control pin in a normal operation, and when above problem occurs, the RMC 22 changes to output a high potential signal (such as “1”). If the problematic BMC 32 receives the high potential signal at the reset pin 321, it is forced to reset.For example, the RMC 22 is set to output a low potential signal (such as "0") or not output any signal via the control pin in a normal operation, and when the above problem occurs, the RMC 22 changes to output a high potential signal (such as "1"). If the problematic BMC 32 receives the high potential signal at the reset pin 321, it is forced to reset.

However, the above description is just a preferred embodiment, but not limited thereto.However, the above description is just preferred, but not limited thereto.

As mentioned above, no matter what problem the BMC 32 has and causes the RMC 22 to fail to communicate with the BMC 32 through the default communication channel 4, the RMC 22 can always force the BMC 32 to reset through the GPIO channel 5, so as to recover the BMC 32 to the initial status. Also, the RMC 22 can establish a connection with the BMC 32 again through the default communication channel 4 after the BMC 32 is recovered to the initial status, and communicates with the recovered BMC 32 and transmits data therewith. There is no need to wait for a manager to recover the above problem manually when the RMC 22 cannot communicate with the BMC 32 regularly.As mentioned above, no matter what problem BMC 32 has and causes RMC 22 to fail to communicate with BMC 32 through default communication channel 4, RMC 22 can always force BMC 32 to reset through GPIO channel 5, so as to Recover BMC 32 to initial status. Also, the RMC 22 can establish a connection with the BMC 32 again through the default communication channel 4 after the BMC 32 is recovered to its initial status, and communicates with the recovered BMC 32 and transmits data therewith. There is no need to wait for a manager to recover the above problem manually when the RMC 22 cannot communicate with the BMC 32 regularly.

In other embodiments, the RMC 22 can interrupt the power provided for the BMC 32 and then recover the power for the BMC 32 through the GPIO channel 5, or interrupt the power provided for the node 3 the BMC 32 arranged and then recover the power for the node 3, so as to accomplish the purpose for resetting the BMC 32.In other embodiments, the RMC 22 can interrupt the power provided for the BMC 32 and then Recover the power for the BMC 32 through the GPIO channel 5, or interrupt the power provided for the node 3 the BMC 32 arranged and then Recover the power for the node 3, so as to accomplish the purpose for resetting the BMC 32.

In particularly, the rack 1 in this embodiment comprises one or more power control chip (not shown), and the power control chip is electrically connected with the plurality of nodes 3 and a power source of the rack 1. In this embodiment, the RMC 22 connects with the power control chip through the GPIO channel 5. When the RMC 22 cannot communicate with the BMC 32 through the default communication channel 4, it can send a reset command to the power control chip through the GPIO channel 5. The power control chip interrupts the power provided for the node 3 (or for the BMC 32) according to the content of the reset command, and then resend the power for the node 3 (or for the BMC 32) immediately.In particular, the rack 1 in this embodiment comprises one or more power control chips (not shown), and the power control chip is electrically connected to the plurality of nodes 3 and the power source of the rack 1. In this embodiment, the RMC 22 connects with the power control chip through the GPIO channel 5. When the RMC 22 cannot communicate with the BMC 32 through the default communication channel 4, it can send a reset command to the power control chip through the GPIO channel 5. The power control chip interrupts the power provided for the node 3 (or for the BMC 32) according to the content of the reset command, and then resend the power for the node 3 (or for the BMC 32) immediately.

Therefore, the BMC 32 can be reset, and can recover to the initial status after the reset action is completed.Therefore, the BMC 32 can be reset, and can recover to initial status after the reset action is completed.

It should be mentioned that the power control chip in this embodiment can control the power provided for all of the nodes 3, if the power is interrupted without a permission, it will bother the user a lot. In other embodiments, the RMC 22 can generate and display an alert signal in advance before sending the reset command, and only sends the reset command to the power control chip if the user confirms the alert signal and agrees the RMC 22 to execute the reset action. However, the above description is just another preferred embodiment, not intended to limit the scope of the present invention.It should be mentioned that the power control chip in this complete can control the power provided for all of the nodes 3, if the power is interrupted without permission, it will bother the user a lot. In other embodiments, the RMC 22 can generate and display an alert signal in advance before sending the reset command, and only send the reset command to the power control chip if the user confirms the alert signal and agrees the RMC 22 to execute the reset action . However, the above description is just another preferred embodiment, not intended to limit the scope of the present invention.

Fig.4 is a reset flowchart of a first embodiment according to the present invention. As shown in Fig.4, before the RMC 22 wants to communicate with the BMCs 32, it firstly sends the ASK signal to the BMCs 32 through the default communication channels 4 respectively (step S10). Secondly, the RMC 22 determines if receiving the RESPONSE signal corresponding to the ASK signal from the BMCs 32 through the default communicationFig.4 is a reset flowchart of a first embodiment according to the present invention. As shown in Fig.4, before the RMC 22 wants to communicate with the BMCs 32, it firstly sends the ASK signal to the BMCs 32 through the default communication channels 4 respectively (step S10). Secondly, the RMC 22 determines if receiving the RESPONSE signal corresponding to the ASK signal from the BMCs 32 through the default communication

channels 4 respectively or not (step S12). After the RMC 22 receives the RESPONSE signal from the BMCs 32, it can communicate with the BMCs 32 through the default communication channels 4 respectively (step S14), and transmits data and files needed by the nodes 3 thereto.channels 4 respectively or not (step S12). After the RMC 22 receives the RESPONSE signal from the BMCs 32, it can communicate with the BMCs 32 through the default communication channels 4 respectively (step S14), and transmits data and files needed by the nodes 3 thereto.

Following the above descriptions, if the RMC 22 does not receive the RESPONSE signal from one of the BMCs 32 during the waiting time (i.e., the BMCs 32 comprises at least one non-responded BMC 32), it determines if the resend time of resending the ASK signal is longer than the resend threshold or not (step S16). If the resend time of the ASK signal is not longer than the resend threshold yet, the RMC 22 resends the ASK signal to the non-responded BMC 32 through one of the default communication channels 4 corresponding to the non-responded BMC 32 again, i.e., the RMC 22 re-executes the step S10 to the step S16.Following the above Descriptions, if the RMC 22 does not receive the RESPONSE signal from one of the BMCs 32 during the waiting time (ie, the BMCs 32 comprises at least one non-responding BMC 32), it determines if the resend time of resending the ASK signal is longer than the resend threshold or not (step S16). If the reset time of the ASK signal is no longer than the resend threshold yet, the RMC 22 resends the ASK signal to the non-responding BMC 32 through one of the default communication channels 4 compliant to the non-responding BMC 32 again, ie , the RMC 22 re-executes step S10 to step S16.

If the resend time of the ASK signal is longer than the resend threshold, the RMC 22If the resend time of the ASK signal is longer than the resend threshold, the RMC 22

determines the non-responded BMC 32 has a problem and considers the non-responded BMC 32 to as the problematic BMC 32, and sends the control signal to the reset pin 321 of the problematic BMC 32 through the GPIO channel 5, so as to force the problematic BMC 32 to reset (step S18). Furthermore, the RMC 22 waits for the reset action of the problematic BMC 32, and communicates with the reset BMC 32 again through one of the default communication channels 4 after the reset action is completed (step S20).determines the non-responding BMC 32 has a problem and observes the non-responding BMC 32, and sends the control signal to the reset pin 321 of the problematic BMC 32 through the GPIO channel 5, so as to force the problematic BMC 32 to reset (step S18). Furthermore, the RMC 22 Waits for the reset action of the problematic BMC 32, and communicates with the reset BMC 32 again through one of the default communication channels 4 after the reset action is completed (step S20).

By using the rack and the automatic recovery method, the present invention can make sure the RMC in the rack can always control all BMCs and recover all BMCs to the initial status in any situation, so as to salve the traditional problem that the RMC cannot communicate with the BMCs through the default communication channels sometimes.By using the rack and the automatic recovery method, the present invention can make sure the RMC in the rack can always control all the BMCs and Recover all the BMCs to the initial status in any situation, so as to salvage the traditional problem that the RMC cannot communicate. with BMCs through default communication channels sometimes.

Therefore, the present invention helps the rack to solve communication problems by itself and prevent from waiting for the manager to solve the above problems manually.Therefore, the present invention helps the rack to solve communication problems by itself and prevent from waiting for the manager to solve the above problems manually.

As the skilled person will appreciate, various changes and modifications can be made to the described embodiment. It is intended to include all such variations, modifications and equivalents which fall within the scope of the present invention, as defined in the accompanying claims.As the skilled person will appreciate, various changes and modifications can be made to the described description. It is intended to include all such variations, modifications and equivalents which fall within the scope of the present invention as defined in the accompanying claims.

Claims (10)

1. Automaattisella palautustoiminnolla varustettu rakki, johon sisältyy vähintään yksi solmu, jossa on emolevyn hallintaohjain (BMC);1. An auto-reset pad containing at least one node with a motherboard control controller (BMC); 5 solmuun sähköisesti liitetty hallintamoduuli, jossa on räkin hallintaohjain (RMC), ja RMC kommunikoi BMC:n kanssa oletustietoliikennekanavan välityksellä;A management module electrically connected to the 5 nodes having a rack management controller (RMC) and the RMC communicating with the BMC via a default communication channel; jossa RMC on sähköisesti liitetty BMC:hen GPIO-kanavan (General Purpose PO) välityksellä, tunnettu siitä, että räkin hallintaohjain (RMC) on konfiguroitu;wherein the RMC is electrically connected to the BMC via a GPIO (General Purpose PO) channel, characterized in that the rack management controller (RMC) is configured; määrittämään, vastaanottaako se BMC:ltä VASTE-signaalin vastauksena 10 PYYNTÖ-signaaliin oletustietoliikennekanavan välityksellä;determine whether it receives a RESPONSE signal from the BMC in response to a REQUEST signal via a default communication channel; määrittämään, onko PYYNTÖ-signaalin uudelleenlähetysaika pidempi kuin uudelleenlähetyksen kynnysarvo, kun VASTE-signaalia ei vastaanoteta, lähettämään PYYNTÖ-signaalin uudelleen BMC:lle oletustietoliikennekanavan välityksellä, jos uudelleenlähetysaika ei ole pidempi kuin uudelleenlähetyksen kynnysarvo, jadetermine whether the REQUEST signal retransmission time is longer than the retransmission threshold when no RESPONSE signal is received, to retransmit the REQUEST signal to the BMC via a default communication channel if the retransmission time is not longer than the retransmission threshold and 15 lähettämään GPIO-kanavan välityksellä BMC:lle ohjaussignaalin, joka pakottaa BMC:n palaamaan alkutilaan, kun uudelleenlähetysaika on pidempi kuin uudelleenlähetyksen kynnysarvo.15, via the GPIO channel, to transmit to the BMC a control signal which forces the BMC to return to the initial state when the retransmission time is longer than the retransmission threshold. 2. Vaatimuksen 1 mukainen räkki, jossa RMC:hen sisältyy ohjausnasta, BMC:hen sisältyy palautusnasta, ja RMC:n ohjausnasta on sähköisesti liitetty BMC:n palautusnastaanThe rack of claim 1, wherein the RMC includes a guide pin, the BMC includes a return pin, and the RMC control pin is electrically connected to the BMC return pin 20 GPIO-kanavan välityksellä.20 GPIO channels. 3. Vaatimuksen 2 mukainen räkki, jossa hallintamoduuliin sisältyy lisäksi piirilevy, solmuun sisältyy lisäksi emolevy, RMC on sähköisesti liitetty piirilevyyn, BMC on sähköisesti liitetty emolevyyn, ja RMC:n ohjausnasta on sähköisesti liitetty BMC:n palautusnastaan ohjaussignaalin lähettämiseksi piirilevyn ja emolevyn välityksellä.The rack of claim 2, wherein the management module further includes a circuit board, the node further includes a motherboard, the RMC is electrically connected to the circuit board, the BMC is electrically connected to the motherboard, and the RMC control pin is electrically connected to the BMC reset pin. 2525 4. Vaatimuksen 2 mukainen räkki, jossa oletustietoliikennekanava on toteutettuA rack as claimed in claim 2, wherein the default communication channel is implemented IPMB-väylällä (Intelligent platform management bus), I C-väylällä (Inter-integrated circuit),IPMB (Intelligent platform management bus), IC (Inter-integrated circuit), 20155123 prh 12 -02- 201820155123 prh 12 -02- 2018 UART-piirillä (Universal asynchronous receiver/transmitter) tai LAN-verkolla (Local area network).UART (Universal Asynchronous receiver / transmitter) or LAN (Local area network). 5. Vaatimuksen 1 mukainen räkki, johon sisältyy lisäksi tehonsäätöpiiri, joka on sähköisesti liitetty solmuun ja räkin virtalähteeseen, RMC on liitetty tehonsäätöpiiriinThe rack of claim 1, further comprising a power control circuit electrically connected to the node and the rack power supply, the RMC being connected to the power control circuit GPIO-kanavan välityksellä ja lähettää palautuskäskyn tehonsäätöpiirille, jos se ei vastaanotaVia a GPIO channel and send a reset command to the power control circuit if it does not receive VASTE-signaalia BMC:ltä oletustietoliikennekanavan välityksellä, ja tehonsäätöpiiri keskeyttää virransyötön solmuun palautuskäskyn sisällön mukaisesti ja palauttaa sitten solmun virransyötön.A RESPONSE signal from the BMC via the default communication channel, and the power control circuit interrupts the power supply to the node according to the contents of the return command and then returns the power supply to the node. 6. Räkin automaattinen palautusmenetelmä, jossa räkissä on hallintamoduuli ja hallintamoduuliin sähköisesti liitetty solmu, hallintamoduuliin sisältyy räkin hallintaohjain (RMC), solmuun sisältyy emolevyn hallintaohjain (BMC), joka kommunikoi RMC:n kanssa oletustietoliikennekanavan välityksellä, tunnettu siitä, että automaattinen palautusmenetelmä käsittää seuraavat vaiheet:6. A rack automatic reset method, wherein the rack includes a management module and a node electrically connected to the management module, the management module including a rack management controller (RMC), a node including a motherboard management controller (BMC) communicating with RMC via a default communication channel, characterized in that : a) määritetään, epäonnistuuko VASTE-signaalin vastaanottaminen BMC:ltä oletustietoliikennekanavan välityksellä RMC:ssä; mihin sisältyy al) määritetään, vastaanotetaanko BMC:ltä VASTE-signaali vastauksenaa) determining whether the response signal from the BMC via the default communication channel in the RMC fails; which includes al) determining whether to receive a response signal from the BMC in response PYYNTÖ-signaaliin oletustietoliikennekanavan välityksellä;A REQUEST signal through a default communication channel; a2) määritetään, onko PYYNTÖ-signaalin uudelleenlähetysaika pidempi kuin uudelleenlähetyksen kynnysarvo, kun VASTE-signaalia ei vastaanoteta;a2) determining whether the REQUEST signal retransmission time is longer than the retransmission threshold when the REPLY signal is not received; a3) lähetetään PYYNTÖ-signaali uudelleen BMC:lle oletustietoliikennekanavan välityksellä, jos uudelleenlähetysaika ei ole pidempi kuin uudelleenlähetyksen kynnysarvo;a3) re-transmitting the REQUEST signal to the BMC via the default communication channel if the retransmission time is not longer than the retransmission threshold; a4) suoritetaan vaihe b, jos uudelleenlähetysaika on pidempi kuin uudelleenlähetyksen kynnysarvo; jaa4) performing step b) if the retransmission time is longer than the retransmission threshold; and b) jos RMC ei vastaanota VASTE-signaalia BMC:tä oletustietoliikennekanavan välityksellä, lähetetään ohjaussignaali BMC:lle GPIO-kanavan (general purposeb) if the RMC does not receive the RESPONSE signal from the BMC via the default communication channel, the control signal is sent to the BMC via the general purpose GPIO channel I/O) välityksellä, millä pakotetaan BMC:n palautus, jossa RMC ja BMC on sähköisesti liitetty toisiinsa GPIO-kanavan välityksellä.I / O), forcing the reset of the BMC, where the RMC and the BMC are electrically interconnected via a GPIO channel. 7. Vaatimuksen 6 mukainen automaattinen palautusmenetelmä, jossa RMC:hen sisältyy ohjausnasta, BMC:hen sisältyy palautusnasta, ja RMC:n ohjausnasta on sähköisesti liitettyThe automatic reset method of claim 6, wherein the RMC includes a control pin, the BMC includes a reset pin, and the RMC control pin is electrically connected 5 BMC:n palautusnastaan GPIO-kanavan välityksellä ohjaussignaalin lähettämistä varten.5 BMC reset pin via GPIO channel for transmitting control signal. 8. Vaatimuksen 7 mukainen automaattinen palautusmenetelmä, johon sisältyy lisäksi vaihetta a edeltävä vaihe aO: lähetetään PYYNTÖ-viesti BMC:lle oletustietoliikennekanavan välityksellä RMC:stä.The automatic reset method of claim 7, further comprising the step a0 preceding step a: transmitting a REQUEST message to the BMC via a default communication channel from the RMC. 9. Vaatimuksen 8 mukainen automaattinen palautusmenetelmä, johon sisältyy lisäksiThe automatic return method of claim 8, further comprising 10 vaihe c: vaiheen b jälkeen odotetaan BMC:n palautustoimintoa ja palautustoiminnon päättymisen jälkeen kommunikoidaan uudelleen BMC:n kanssa oletustietoliikennekanavan välityksellä.Step c: After step b, the BMC restore operation is awaited and after the completion of the recovery operation, the BMC is re-communicated with the default communication channel.
FI20155123A 2014-12-02 2015-02-24 Rack having automatic recovery function and automatic recovery method for the same FI127498B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW103141816A TWI530778B (en) 2014-12-02 2014-12-02 Rack having automatic recovery function and automatic recovery method for the same

Publications (2)

Publication Number Publication Date
FI20155123A FI20155123A (en) 2016-06-03
FI127498B true FI127498B (en) 2018-07-31

Family

ID=56361511

Family Applications (1)

Application Number Title Priority Date Filing Date
FI20155123A FI127498B (en) 2014-12-02 2015-02-24 Rack having automatic recovery function and automatic recovery method for the same

Country Status (3)

Country Link
FI (1) FI127498B (en)
RU (1) RU2614569C2 (en)
TW (1) TWI530778B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018725B (en) * 2018-01-09 2023-02-10 佛山市顺德区顺达电脑厂有限公司 Method and system for remotely resetting baseboard management controller of computer system
US10846160B2 (en) 2018-01-12 2020-11-24 Quanta Computer Inc. System and method for remote system recovery
RU2697745C1 (en) * 2018-04-18 2019-08-19 ЭйАйСи ИНК. Intelligent rack and method of managing ip-addresses used therein
CN109240891A (en) * 2018-09-26 2019-01-18 郑州云海信息技术有限公司 A kind of monitoring method and device of SR whole machine cabinet server
RU2709677C1 (en) * 2019-04-09 2019-12-19 ЭйАйСи ИНК. Method of remote abnormal state reset of racks used in data center
RU2711469C1 (en) * 2019-04-09 2020-01-17 ЭйАйСи ИНК. Method of remote abnormal state reset of racks used in data center
RU2710288C1 (en) * 2019-04-09 2019-12-25 ЭйАйСи ИНК. Method of remote abnormal state reset of racks used in data center
CN114116280B (en) * 2021-11-11 2023-08-18 苏州浪潮智能科技有限公司 Interactive BMC self-recovery method, system, terminal and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271725A1 (en) * 2008-04-24 2009-10-29 Fred Dirla System and Method for Rack management and Capacity Planning
RU82342U1 (en) * 2008-11-14 2009-04-20 Борис Алексеевич Хозяинов EQUIPMENT CONTROL SYSTEM
CN103092138B (en) * 2011-10-28 2014-12-03 英业达科技有限公司 Control method of equipment cabinet system
RU120260U1 (en) * 2012-01-17 2012-09-10 Общество с ограниченной ответственностью Научно-производственный центр завода "Красное Знамя" (ООО "НПЦ завода "Красное Знамя") COMPUTER RACK
CN104216499B (en) * 2013-05-31 2017-03-08 英业达科技有限公司 Rack and its power control method

Also Published As

Publication number Publication date
TWI530778B (en) 2016-04-21
RU2015109465A (en) 2016-10-10
TW201621539A (en) 2016-06-16
RU2614569C2 (en) 2017-03-28
FI20155123A (en) 2016-06-03

Similar Documents

Publication Publication Date Title
FI127498B (en) Rack having automatic recovery function and automatic recovery method for the same
US20160239370A1 (en) Rack having automatic recovery function and automatic recovery method for the same
US9189316B2 (en) Managing failover in clustered systems, after determining that a node has authority to make a decision on behalf of a sub-cluster
US9665456B2 (en) Apparatus and method for identifying a cause of an error occurring in a network connecting devices within an information processing apparatus
CN109143954B (en) System and method for realizing controller reset
US9026685B2 (en) Memory module communication control
JP2017517060A (en) Fault processing method, related apparatus, and computer
US10275330B2 (en) Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus
US10691562B2 (en) Management node failover for high reliability systems
US11953976B2 (en) Detecting and recovering from fatal storage errors
US10102088B2 (en) Cluster system, server device, cluster system management method, and computer-readable recording medium
US10055268B2 (en) Detecting high availability readiness of a distributed computing system
CN111538624A (en) Server power supply maintenance method, device, equipment and medium
CN111090537B (en) Cluster starting method and device, electronic equipment and readable storage medium
CN109828855B (en) Multiprocessor error detection system and method thereof
EP1691295A2 (en) Multiplex apparatus and method for multiplexing legacy device
US20070180329A1 (en) Method of latent fault checking a management network
US9405629B2 (en) Information processing system, method for controlling information processing system, and storage medium
CN212541329U (en) Dual-redundancy computer equipment based on domestic Loongson platform
US20110271138A1 (en) System and method for handling system failure
US9454452B2 (en) Information processing apparatus and method for monitoring device by use of first and second communication protocols
CN112084049A (en) Method for monitoring resident program of baseboard management controller
CN107451035B (en) Error state data providing method for computer device
JP2016009499A (en) Methods and systems for managing interconnection
CN111522718A (en) Server power supply system and server

Legal Events

Date Code Title Description
FG Patent granted

Ref document number: 127498

Country of ref document: FI

Kind code of ref document: B

MM Patent lapsed