The application has the following application numbers: 201310661254.3, filing date: in 2013, 12/09/h, a divisional application of an invention patent entitled "fan error detection system and method".
Disclosure of Invention
In view of the foregoing, it is desirable to provide a fan error detection system and method, which can simultaneously detect whether an abnormality occurs in a fan or a fan controller, and reduce the system temperature of a server by reducing the CPU frequency of the server when the abnormality occurs in the fan or the fan controller.
The fan error detection system runs in a server, the server comprises a BMC controller, the BMC controller is used for connecting the server with a fan controller through a communication pin, the fan controller is electrically connected to a fan wall, and one or more groups of fans are installed on the fan wall. The fan error detection system comprises: the BMC controller is used for receiving an interrupt signal generated by the fan controller and judging whether the duration time of the pin voltage at a low potential is longer than a preset period time or the duration time of the pin voltage at a high potential is longer than the preset period time; the abnormal reporting module is used for generating abnormal condition information of the rotating speed of the fan when the duration time of the pin voltage with the low potential is longer than the preset period time, and generating abnormal condition information of the fan controller when the duration time of the pin voltage with the high potential is longer than the preset period time; and the exception handling module is used for reducing the system temperature of the server by reducing the CPU frequency of the server.
The fan error detection method is operated in a computer, the server comprises a BMC controller, the BMC controller is used for connecting the server with the fan controller through a communication pin, the fan controller is electrically connected to a fan wall, and one or more groups of fans are installed on the fan wall. The method comprises the following steps: after the server is started, pin voltage of a communication pin between the server and the fan controller is continuously monitored; when the BMC controller receives an interrupt signal generated by the fan controller, judging whether the duration time of the pin voltage with the low potential is longer than a preset period time or the duration time of the pin voltage with the high potential is longer than the preset period time; when the duration time of the pin voltage being the low potential is longer than the preset period time, generating the abnormal condition information of the rotating speed of the fan; when the duration time of the pin voltage being the high potential is longer than the preset period time, generating abnormal condition information of the fan controller; and reducing the system temperature of the server by reducing the CPU frequency of the server.
Compared with the prior art, the fan error detection system and the fan error detection method are applied to the BMC controller of the server, can detect whether the fan or the fan controller is abnormal or not at the same time, and reduce the system temperature of the server by reducing the CPU frequency of the server when the fan or the fan controller is abnormal.
Drawings
FIG. 1 is a schematic diagram of an operating environment of a fan error detection system according to a preferred embodiment of the present invention.
FIG. 2 is a flow chart of a fan error detection method according to a preferred embodiment of the present invention.
FIG. 3 is a schematic diagram of the potential change of the communication pin between the server and the fan controller.
Description of the main elements
Server 1
BMC controller 10
Fan fault detection system 100
Electric potential detecting module 101
Exception reporting module 102
Exception handling module 103
Communication pin 11
Memory 12
Central processing unit 13
Display 14
Fan controller 2
Fan wall 3
Fan 30
Detailed Description
Referring to fig. 1, a schematic operating environment of a fan error detection system 100 according to a preferred embodiment of the invention is shown. In the embodiment, the fan error detection system 100 is installed and operated in a server 1, and the server 1 includes, but is not limited to, a BMC Controller (BMC Controller)10, a memory 12, a Central Processing Unit (CPU)13, and a display 14. The BMC controller includes a Communication Pin (Communication Pin)11, and connects the server 1 with the fan controller 2 through the Communication Pin 11, and the fan controller 2 is electrically connected to the fan wall 3. The fan wall 3 is provided with one or more groups of fans 30 for cooling the server 1 by heat dissipation. The memory 12 may be a memory (memory), a Flash memory (Flash ROM), a Hard disk (Hard disk) or other magnetic disks.
In the present embodiment, when the potential of the communication pin 11 is high for 5S, it is determined that the fan controller 2 is damaged; when the potential of the communication pin 11 is the low potential and continues for 5S, it is determined that the rotating speed of the fan 30 is abnormal; when the potential of the communication pin 11 fluctuates in phase between a high potential of 100ms and a low potential of 100ms, it is determined that both the fan 30 and the fan controller 2 are operating in a normal state, and the operating states of the fan controller 2 and the fan 30 are effectively monitored at the same time.
The fan error detection system 100 is stored in a Flash memory (e.g., Flash ROM) of the BMC controller 10, and can determine whether the fan 30 or the fan controller 2 is abnormal by detecting a potential change of the communication pin 11 between the server 1 and the fan controller 2, and reduce the system temperature of the server 1 by reducing the CPU frequency of the server 1. The fan error detection system 100 includes a potential detection module 101, an abnormality reporting module 102, and an abnormality processing module 103. The functional module referred to in the present invention is a series of program instruction segments that can be executed by the central processing unit 13 of the server 1 and can perform a fixed function, and is stored in the memory 12 of the server 1 or the Flash memory of the BMC controller 10. The functional blocks 101-104 are described in the flow charts of fig. 2 and 3.
FIG. 2 is a flow chart of a fan error detection method according to a preferred embodiment of the present invention. In this embodiment, the method is applied to the BMC controller 10 of the server 1, and can detect whether the fan 30 and the fan controller 2 are abnormal at the same time, and when the fan 30 or the fan controller 2 is abnormal, the CPU frequency of the server 1 is reduced to automatically reduce the system temperature of the server 1.
In step S21, after the server 1 is powered on, the voltage detecting module 101 continuously monitors the pin voltage of the communication pin 11 between the server 1 and the fan controller 2. Referring to fig. 3A, when the pin voltage of the communication pin 11 fluctuates in phase between a high level for a predetermined time (e.g., 100ms) and a low level for a predetermined time (e.g., 100ms), the level detection module determines that the fan 30 and the fan controller 2 are both operating in a normal state.
In step S22, the level detecting module 101 determines whether the BMC controller 10 receives the interrupt signal generated by the fan controller 2. In this embodiment, when the rotation speed of the fan 30 is too low or the fan 30 is unplugged, the fan controller 2 will trigger an interrupt, and the interrupt will pull down the pin voltage of the communication pin 11 between the fan controllers 2, and the BMC controller receives an interrupt signal by monitoring the pin voltage of the communication pin 11, and thus knows that the pin voltage of the communication pin 11 of the fan controller 2 is at a low potential. If the BMC controller 10 receives the interrupt signal generated by the fan controller 2, the process goes to step S23; if the BMC controller 10 does not receive the interrupt signal generated by the fan controller 2, the process returns to step S21.
In step S23, the voltage detecting module 101 determines whether the duration of the pin voltage being a low voltage is longer than a preset period time or the duration of the pin voltage being a high voltage is longer than a preset period time. In this embodiment, the preset cycle time may be defined as 5S, or other cycle times may be defined according to the requirements of the customers. If the duration of the pin voltage of the communication pin 11 being at the low potential is longer than a period of time, the process goes to step S24; if the duration of the pin voltage being high is longer than the preset period time, the process goes to step S25.
In step S24, the abnormality reporting module 102 generates the status information of the abnormality in the fan rotational speed, and displays the status information of the abnormality in the fan rotational speed on the display 14 of the server 1. Referring to fig. 3B, when the duration of the low pin voltage of the communication pin 11 is longer than 5S, the abnormality reporting module 102 issues the abnormal condition information indicating that the rotation speed of the fan 30 is abnormal, records a system event log (system event log) in the memory of the BMC controller 10, and displays the system event log on the display 14 to report the abnormal condition of the fan 30 to the system administrator.
In step S25, the abnormality reporting module 102 generates the status information that the abnormality of the fan controller 2 has occurred, and displays the abnormal status information of the fan controller 2 on the display 14 of the server 1. Referring to fig. 3C, when the duration of the pin voltage of the communication pin 11 being high is greater than 5S, the abnormality reporting module 102 issues the abnormal condition information of the fan controller 2, records a system event log in the memory of the BMC controller 10, and displays the system event log on the server 1 to report the abnormal condition of the fan controller 2 to the system administrator.
In step S26, the abnormality processing module 103 lowers the system temperature of the server 1 by lowering the CPU frequency of the server 1. In this embodiment, the BMC controller 10 triggers a heat dissipation potential Pin (Processor Hot Pin) of the CPU 13 through the PECI interface to minimize the CPU frequency in the shortest time, so as to quickly reduce the system temperature of the server 1 and prevent the server 1 from being damaged due to high temperature.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention.