CN108181977B - Server - Google Patents

Server Download PDF

Info

Publication number
CN108181977B
CN108181977B CN201810128654.0A CN201810128654A CN108181977B CN 108181977 B CN108181977 B CN 108181977B CN 201810128654 A CN201810128654 A CN 201810128654A CN 108181977 B CN108181977 B CN 108181977B
Authority
CN
China
Prior art keywords
fan
controller
server
abnormal
pin voltage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810128654.0A
Other languages
Chinese (zh)
Other versions
CN108181977A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenzhou Digital Cloud Information Technology Co ltd
Shenzhou Kuntai Xiamen Information Technology Co ltd
Original Assignee
Beijing Shenzhou Digital Cloud Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Digital Cloud Information Technology Co ltd filed Critical Beijing Shenzhou Digital Cloud Information Technology Co ltd
Priority to CN201810128654.0A priority Critical patent/CN108181977B/en
Publication of CN108181977A publication Critical patent/CN108181977A/en
Application granted granted Critical
Publication of CN108181977B publication Critical patent/CN108181977B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Cooling Or The Like Of Electrical Apparatus (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A server includes a BMC controller that connects the server to a fan controller via a communication pin, the fan controller electrically connected to a fan wall, the fan wall having one or more sets of fans mounted thereon. The fan error detection system comprises a potential detection module, an abnormality report module and an abnormality processing module. The electric potential detecting module continuously monitors pin voltage of a communication pin between the server and the fan controller, the abnormity reporting module detects whether the fan or the fan controller is abnormal according to the pin voltage, and when the fan or the fan controller is abnormal, the abnormity processing module reduces the system temperature of the server by reducing the CPU frequency of the server. By implementing the invention, whether the fan and the fan controller are abnormal or not can be detected at the same time, and when the fan or the fan controller is abnormal, the system temperature of the server is reduced by reducing the CPU frequency of the server.

Description

Server
The application has the following application numbers: 201310661254.3, filing date: in 2013, 12/09/h, a divisional application of an invention patent entitled "fan error detection system and method".
Technical Field
The present invention relates to a server heat dissipation monitoring system and method, and more particularly, to a server fan error detection system and method.
Background
In a server development project, a method of sharing fan control by a cabinet is generally adopted, and a group of fan walls is shared by a plurality of (39) servers in the whole cabinet. The fan wall comprises a set of fan controllers and 30 sets of fans. Wherein, there are 3 servers connected to the fan controller, so as to know whether the fan is working normally.
The truth table for judging the running state of the fan and the fan controller has the following meanings: a binary value of GPIOO3_ TACH3_ VPR1 of HIGH (HIGH) indicates a fan controller exception; the binary value of GPIOO3_ TACH3_ VPR1 is LOW (LOW), which indicates that the fan speed is abnormal, and the abnormal condition includes too slow speed, too fast speed and the connection wire falling off. Under the binary judgment condition, the system cannot monitor the operation conditions of the fan controller and the fan at the same time, and the customer insists on using the discrete signal to judge whether the fan control is normal, so that it becomes very important to provide an effective method for the server to monitor the operation conditions of the fan controller and the fan at the same time.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a fan error detection system and method, which can simultaneously detect whether an abnormality occurs in a fan or a fan controller, and reduce the system temperature of a server by reducing the CPU frequency of the server when the abnormality occurs in the fan or the fan controller.
The fan error detection system runs in a server, the server comprises a BMC controller, the BMC controller is used for connecting the server with a fan controller through a communication pin, the fan controller is electrically connected to a fan wall, and one or more groups of fans are installed on the fan wall. The fan error detection system comprises: the BMC controller is used for receiving an interrupt signal generated by the fan controller and judging whether the duration time of the pin voltage at a low potential is longer than a preset period time or the duration time of the pin voltage at a high potential is longer than the preset period time; the abnormal reporting module is used for generating abnormal condition information of the rotating speed of the fan when the duration time of the pin voltage with the low potential is longer than the preset period time, and generating abnormal condition information of the fan controller when the duration time of the pin voltage with the high potential is longer than the preset period time; and the exception handling module is used for reducing the system temperature of the server by reducing the CPU frequency of the server.
The fan error detection method is operated in a computer, the server comprises a BMC controller, the BMC controller is used for connecting the server with the fan controller through a communication pin, the fan controller is electrically connected to a fan wall, and one or more groups of fans are installed on the fan wall. The method comprises the following steps: after the server is started, pin voltage of a communication pin between the server and the fan controller is continuously monitored; when the BMC controller receives an interrupt signal generated by the fan controller, judging whether the duration time of the pin voltage with the low potential is longer than a preset period time or the duration time of the pin voltage with the high potential is longer than the preset period time; when the duration time of the pin voltage being the low potential is longer than the preset period time, generating the abnormal condition information of the rotating speed of the fan; when the duration time of the pin voltage being the high potential is longer than the preset period time, generating abnormal condition information of the fan controller; and reducing the system temperature of the server by reducing the CPU frequency of the server.
Compared with the prior art, the fan error detection system and the fan error detection method are applied to the BMC controller of the server, can detect whether the fan or the fan controller is abnormal or not at the same time, and reduce the system temperature of the server by reducing the CPU frequency of the server when the fan or the fan controller is abnormal.
Drawings
FIG. 1 is a schematic diagram of an operating environment of a fan error detection system according to a preferred embodiment of the present invention.
FIG. 2 is a flow chart of a fan error detection method according to a preferred embodiment of the present invention.
FIG. 3 is a schematic diagram of the potential change of the communication pin between the server and the fan controller.
Description of the main elements
Server 1
BMC controller 10
Fan fault detection system 100
Electric potential detecting module 101
Exception reporting module 102
Exception handling module 103
Communication pin 11
Memory 12
Central processing unit 13
Display 14
Fan controller 2
Fan wall 3
Fan 30
Detailed Description
Referring to fig. 1, a schematic operating environment of a fan error detection system 100 according to a preferred embodiment of the invention is shown. In the embodiment, the fan error detection system 100 is installed and operated in a server 1, and the server 1 includes, but is not limited to, a BMC Controller (BMC Controller)10, a memory 12, a Central Processing Unit (CPU)13, and a display 14. The BMC controller includes a Communication Pin (Communication Pin)11, and connects the server 1 with the fan controller 2 through the Communication Pin 11, and the fan controller 2 is electrically connected to the fan wall 3. The fan wall 3 is provided with one or more groups of fans 30 for cooling the server 1 by heat dissipation. The memory 12 may be a memory (memory), a Flash memory (Flash ROM), a Hard disk (Hard disk) or other magnetic disks.
In the present embodiment, when the potential of the communication pin 11 is high for 5S, it is determined that the fan controller 2 is damaged; when the potential of the communication pin 11 is the low potential and continues for 5S, it is determined that the rotating speed of the fan 30 is abnormal; when the potential of the communication pin 11 fluctuates in phase between a high potential of 100ms and a low potential of 100ms, it is determined that both the fan 30 and the fan controller 2 are operating in a normal state, and the operating states of the fan controller 2 and the fan 30 are effectively monitored at the same time.
The fan error detection system 100 is stored in a Flash memory (e.g., Flash ROM) of the BMC controller 10, and can determine whether the fan 30 or the fan controller 2 is abnormal by detecting a potential change of the communication pin 11 between the server 1 and the fan controller 2, and reduce the system temperature of the server 1 by reducing the CPU frequency of the server 1. The fan error detection system 100 includes a potential detection module 101, an abnormality reporting module 102, and an abnormality processing module 103. The functional module referred to in the present invention is a series of program instruction segments that can be executed by the central processing unit 13 of the server 1 and can perform a fixed function, and is stored in the memory 12 of the server 1 or the Flash memory of the BMC controller 10. The functional blocks 101-104 are described in the flow charts of fig. 2 and 3.
FIG. 2 is a flow chart of a fan error detection method according to a preferred embodiment of the present invention. In this embodiment, the method is applied to the BMC controller 10 of the server 1, and can detect whether the fan 30 and the fan controller 2 are abnormal at the same time, and when the fan 30 or the fan controller 2 is abnormal, the CPU frequency of the server 1 is reduced to automatically reduce the system temperature of the server 1.
In step S21, after the server 1 is powered on, the voltage detecting module 101 continuously monitors the pin voltage of the communication pin 11 between the server 1 and the fan controller 2. Referring to fig. 3A, when the pin voltage of the communication pin 11 fluctuates in phase between a high level for a predetermined time (e.g., 100ms) and a low level for a predetermined time (e.g., 100ms), the level detection module determines that the fan 30 and the fan controller 2 are both operating in a normal state.
In step S22, the level detecting module 101 determines whether the BMC controller 10 receives the interrupt signal generated by the fan controller 2. In this embodiment, when the rotation speed of the fan 30 is too low or the fan 30 is unplugged, the fan controller 2 will trigger an interrupt, and the interrupt will pull down the pin voltage of the communication pin 11 between the fan controllers 2, and the BMC controller receives an interrupt signal by monitoring the pin voltage of the communication pin 11, and thus knows that the pin voltage of the communication pin 11 of the fan controller 2 is at a low potential. If the BMC controller 10 receives the interrupt signal generated by the fan controller 2, the process goes to step S23; if the BMC controller 10 does not receive the interrupt signal generated by the fan controller 2, the process returns to step S21.
In step S23, the voltage detecting module 101 determines whether the duration of the pin voltage being a low voltage is longer than a preset period time or the duration of the pin voltage being a high voltage is longer than a preset period time. In this embodiment, the preset cycle time may be defined as 5S, or other cycle times may be defined according to the requirements of the customers. If the duration of the pin voltage of the communication pin 11 being at the low potential is longer than a period of time, the process goes to step S24; if the duration of the pin voltage being high is longer than the preset period time, the process goes to step S25.
In step S24, the abnormality reporting module 102 generates the status information of the abnormality in the fan rotational speed, and displays the status information of the abnormality in the fan rotational speed on the display 14 of the server 1. Referring to fig. 3B, when the duration of the low pin voltage of the communication pin 11 is longer than 5S, the abnormality reporting module 102 issues the abnormal condition information indicating that the rotation speed of the fan 30 is abnormal, records a system event log (system event log) in the memory of the BMC controller 10, and displays the system event log on the display 14 to report the abnormal condition of the fan 30 to the system administrator.
In step S25, the abnormality reporting module 102 generates the status information that the abnormality of the fan controller 2 has occurred, and displays the abnormal status information of the fan controller 2 on the display 14 of the server 1. Referring to fig. 3C, when the duration of the pin voltage of the communication pin 11 being high is greater than 5S, the abnormality reporting module 102 issues the abnormal condition information of the fan controller 2, records a system event log in the memory of the BMC controller 10, and displays the system event log on the server 1 to report the abnormal condition of the fan controller 2 to the system administrator.
In step S26, the abnormality processing module 103 lowers the system temperature of the server 1 by lowering the CPU frequency of the server 1. In this embodiment, the BMC controller 10 triggers a heat dissipation potential Pin (Processor Hot Pin) of the CPU 13 through the PECI interface to minimize the CPU frequency in the shortest time, so as to quickly reduce the system temperature of the server 1 and prevent the server 1 from being damaged due to high temperature.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention.

Claims (5)

1. A server comprises a BMC controller, the BMC controller comprises a Flash memory, a fan error detection system is stored in the Flash memory of the BMC controller, the BMC controller connects the server with the fan controller through a communication pin, the fan controller is electrically connected to a fan wall, one or more groups of fans are mounted on the fan wall, and the fan error detection system is characterized by comprising:
the electric potential detection module is used for continuously monitoring pin voltage of a communication pin between the server and the fan controller after the server is started;
the abnormal report module is used for generating abnormal condition information when the fan or the fan controller is abnormal; and
the exception handling module is used for reducing the system temperature of the server by reducing the CPU frequency of the server;
when the BMC controller receives an interrupt signal generated by the fan controller, the potential detection module judges whether the duration time of the pin voltage at the low potential is longer than a preset period time or the duration time of the pin voltage at the high potential is longer than a preset period time; when the pin voltage is subjected to phase fluctuation between a high potential in preset time and a low potential in preset time, the potential detection module judges that the fan and the fan controller operate in a normal state; when the fan speed is too low or the fan is pulled out, the fan controller triggers an interrupt to generate an interrupt signal, and the interrupt signal pulls down the pin voltage to a low potential.
2. The server according to claim 1, wherein the abnormality reporting module generates the status information that the fan rotation speed is abnormal when the duration of the pin voltage being the low potential is longer than a preset cycle time.
3. The server according to claim 2, wherein the abnormality reporting module generates the condition information that the fan controller is abnormal when the duration of the pin voltage being high is longer than a preset period time.
4. The server according to claim 3, wherein when the duration of the pin voltage being at the low level is longer than a preset period time, the abnormality reporting module records the abnormal fan speed status information in a system event log of the BMC controller, and displays the abnormal fan speed status information on a display of the server.
5. The server according to claim 4, wherein when the duration of the pin voltage being high is longer than a preset period time, the abnormality reporting module records the abnormal condition information of the fan controller in a system event log of the BMC controller and displays the abnormal condition information of the fan controller on a display of the server.
CN201810128654.0A 2013-12-09 2013-12-09 Server Active CN108181977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810128654.0A CN108181977B (en) 2013-12-09 2013-12-09 Server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310661254.3A CN104699589B (en) 2013-12-09 2013-12-09 Fan fault detection system and method
CN201810128654.0A CN108181977B (en) 2013-12-09 2013-12-09 Server

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201310661254.3A Division CN104699589B (en) 2013-12-09 2013-12-09 Fan fault detection system and method

Publications (2)

Publication Number Publication Date
CN108181977A CN108181977A (en) 2018-06-19
CN108181977B true CN108181977B (en) 2020-11-24

Family

ID=53346746

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201810128654.0A Active CN108181977B (en) 2013-12-09 2013-12-09 Server
CN201310661254.3A Active CN104699589B (en) 2013-12-09 2013-12-09 Fan fault detection system and method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201310661254.3A Active CN104699589B (en) 2013-12-09 2013-12-09 Fan fault detection system and method

Country Status (1)

Country Link
CN (2) CN108181977B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426289A (en) * 2015-11-12 2016-03-23 姚焕根 Baseboard management controller and method for monitoring fan and fan controller
CN106682162B (en) * 2016-12-26 2021-03-09 浙江宇视科技有限公司 Log management method and device
CN109491813B (en) * 2017-09-11 2022-07-08 技嘉科技股份有限公司 ARM architecture server and management method thereof
CN107656852A (en) * 2017-10-24 2018-02-02 郑州云海信息技术有限公司 A kind of server multi-fan detecting fault control device and control method
CN108983922A (en) * 2018-06-27 2018-12-11 紫光华山信息技术有限公司 Working frequency adjusting method, device and server
CN109388210B (en) * 2018-12-06 2024-03-29 京信网络系统股份有限公司 Distributed chassis, and management method and device of distributed chassis
CN113550928B (en) * 2021-09-09 2023-08-11 迈普通信技术股份有限公司 Fan control method and electronic equipment
CN113884946A (en) * 2021-09-14 2022-01-04 科华数据股份有限公司 Fan abnormity monitoring method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512293A (en) * 2002-12-31 2004-07-14 联想(北京)有限公司 Method for computer radiation and control noise
CN201039383Y (en) * 2007-01-19 2008-03-19 青岛海信电器股份有限公司 Self protection fan failure detection circuit and TV set with this circuit
CN101165354A (en) * 2006-10-18 2008-04-23 鸿富锦精密工业(深圳)有限公司 Fan rotation speed automatic control circuit
CN101328901A (en) * 2008-07-25 2008-12-24 华为技术有限公司 Apparatus and method for detecting fan fault
CN101882101A (en) * 2010-07-02 2010-11-10 深圳市顶星数码网络技术有限公司 Temperature monitoring prompt system and notebook computer
CN103309426A (en) * 2012-03-12 2013-09-18 鸿富锦精密工业(深圳)有限公司 Server

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211430A1 (en) * 2006-01-13 2007-09-13 Sun Microsystems, Inc. Compact rackmount server
CN101430659A (en) * 2007-11-05 2009-05-13 英业达股份有限公司 Management method and system for monitoring chip of system management bus
CN102999414B (en) * 2011-09-14 2016-04-20 赛恩倍吉科技顾问(深圳)有限公司 Fan circuit for detecting
CN103186452A (en) * 2011-12-27 2013-07-03 鸿富锦精密工业(深圳)有限公司 Server system
CN102521109B (en) * 2011-12-31 2016-01-13 曙光信息产业股份有限公司 Method for monitoring states of server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512293A (en) * 2002-12-31 2004-07-14 联想(北京)有限公司 Method for computer radiation and control noise
CN101165354A (en) * 2006-10-18 2008-04-23 鸿富锦精密工业(深圳)有限公司 Fan rotation speed automatic control circuit
CN201039383Y (en) * 2007-01-19 2008-03-19 青岛海信电器股份有限公司 Self protection fan failure detection circuit and TV set with this circuit
CN101328901A (en) * 2008-07-25 2008-12-24 华为技术有限公司 Apparatus and method for detecting fan fault
CN101882101A (en) * 2010-07-02 2010-11-10 深圳市顶星数码网络技术有限公司 Temperature monitoring prompt system and notebook computer
CN103309426A (en) * 2012-03-12 2013-09-18 鸿富锦精密工业(深圳)有限公司 Server

Also Published As

Publication number Publication date
CN108181977A (en) 2018-06-19
CN104699589B (en) 2018-01-23
CN104699589A (en) 2015-06-10

Similar Documents

Publication Publication Date Title
CN108181977B (en) Server
US7346468B2 (en) Method and apparatus for detecting heat sink faults
JP6008070B1 (en) Operation management apparatus, operation management method, and recording medium on which operation management program is recorded
US10519960B2 (en) Fan failure detection and reporting
CN111486121B (en) Fan operation state diagnostic device and method thereof
US20150193325A1 (en) Method and system for determining hardware life expectancy and failure prevention
US20060142901A1 (en) Microcontroller methods of improving reliability in DC brushless motors and cooling fans
US20180164795A1 (en) Fan monitoring system
TW201523239A (en) System and method for detecting working status of fans and fan controller
CN111124827B (en) Monitoring device and monitoring method for equipment fan
CN104639380A (en) Server monitoring method
CN109871692B (en) Over-temperature power failure protection method, logic device, service board and network system
JP2013168107A (en) Information processing device, abnormality detection method, and program
WO2020000760A1 (en) Server management method and device, computer apparatus, and storage medium
CN114153693A (en) Server fan state monitoring method and device and storage medium
CN106227313A (en) The duty of a kind of DC radiation fan determines method and device
CN112667470A (en) System, method and medium for evaluating and detecting server power
TW201428487A (en) Testing system and testing method thereof
TW201416854A (en) System and method for adjusting a speed of a cursor of a mouse
CN105426289A (en) Baseboard management controller and method for monitoring fan and fan controller
TW201530304A (en) Method for alarming abnormal status
JP6800935B2 (en) How to control a fan in an electronic system
CN112131048A (en) Control method and device for server indicator lamp
WO2017072904A1 (en) Computer system and failure detection method
CN110377450A (en) A kind of hardware anomalies processing method, system and associated component

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201103

Address after: No.301, 3 / F, No.9, shangdijiu street, Haidian District, Beijing

Applicant after: Beijing Shenzhou Digital Cloud Information Technology Co.,Ltd.

Address before: 362000 No. 120 Shanyao Longzhuang Commercial and Residential Building, Quanzhou Quangang District, Fujian Province

Applicant before: QUANZHOU QUANGANG KAIWEI INFORMATION TECHNOLOGY CONSULTING Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240729

Address after: 100085 No.301, 3rd floor, 9 shangdijiu street, Haidian District, Beijing

Patentee after: Beijing Shenzhou Digital Cloud Information Technology Co.,Ltd.

Country or region after: China

Patentee after: Shenzhou Kuntai (Xiamen) Information Technology Co.,Ltd.

Address before: 100085 No.301, 3rd floor, 9 shangdijiu street, Haidian District, Beijing

Patentee before: Beijing Shenzhou Digital Cloud Information Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right